Evaluation disrupted Part I: It’s not if AI, it’s when AI, and that when is now…
Dr. Jeffrey Sheldon

Evaluation disrupted Part I: It’s not if AI, it’s when AI, and that when is now… Dr. Jeffrey Sheldon

Introduction

As the title says, AI is here, it’s a disruptive innovation, and it’s already coming to disrupt evaluation apace whether we want it to or not. I’ve written about AI’s role in research and evaluation in previous articles but contrary to my earlier belief that AI and the social science research methods which provide the scaffolding for this work are complimentary, I now believe AI will wholly replace it by eventually by cutting out the evaluator as we at present know them to be. As an aside, if you’re contemplating a graduate degree in evaluation I would advise against it. Your best bet would be a certificate so you understand evaluation fundamentals then go get yourself some sort of technology training or advanced degree because that’s where the game’s at and going to be for the foreseeable future; I think you’ll be much better off and happier for it. Why? I predict those of us who already have the graduate-level education and training, and experience to conduct evaluations are going to be out of a job sooner rather than later once the organizations we work with, or who we might work with, figure out they can use AI to do their own evaluations at a fraction of the cost. Who needs an evaluator when technology can do the job more efficiently, more effectively, and with, to put it in economic terms, fewer externalities? Plus, is not a vertical organizational structure that relies on in-house capacity better than a horizontal organizational structure that requires all manner of contractors and consultants? Optimally, if we’re being honest, we should be working our way out of our roles as evaluators anyway by building then handing capacity over to organizations. Heresy, I know. Besides, given that many organizations use external evaluators unless they have deep enough pockets to have their own in-house person, in this moment of economic contraction there will likely be less money available for evaluation so an organization’s ability to conduct its own using a GPT just makes sense. Again, why hire an external evaluator or keep an internal evaluator on payroll when an organization has the capacity, or soon will have the capacity, to conduct its own evaluations? That being said, my other prediction is that if we want to have any semblance of evaluation work in the future then our new role will have to be in providing technical assistance to organizations so they can set up their own GPTs to conduct their own evaluation in toto. In essence, an evaluator or researcher might assist in the writing of the GPT as a point of departure, but the organization staff members will do the rest. So, out of curiosity, I decided to test out my first prediction using two sample GPTs I created framed by this basic question: what might non-profit organizations (NPOs), not-for-profit organizations (NFPOs), and non-governmental organizations (NGOs) want AI to do in terms of self-guided evaluation of their programs, projects, or organizational functions and processes? Below is the first prompt I put into ChatGPT 4.o, the results follow. It took more than a couple of tries to provide a GPT that was remotely close to what I was trying to create, and to be honest I don’t think I quite got there in the end. Oh well, it’s a process and this is only Part I. This is a long piece, settle in, read on.

Prompt 1

I need a GPT for a non-profit organization, not-for-profit organization, or non-government organization - regardless of size, location, or mission - so it can conduct its own program, project, or organizational function and process evaluations from start to finish without the assistance of either an internal or external evaluator. This GPT should be for people in those three types of organizations who want to evaluate their work but don't necessarily have any formal training or experience in conducting evaluations. Basically, this would be a stand-alone way for organization staff members to conduct their own, in-house evaluations. This GPT should help organizational staff members to be able to:

  • Allocate the necessary time, resources, personnel, and budget to conduct evaluations.
  • Determine the phenomenon/phenomena they want to evaluate.
  • Determine what they want to learn from the evaluation.
  • Conduct academic and grey literature reviews as the basis for their evaluations, if necessary.
  • Formulate evaluation questions based on either the academic or grey literature, or those that are relevant to the organization’s projects or programs.
  • Formulate testable hypotheses, as appropriate.
  • Determine the most appropriate evaluation design to answer the evaluation questions including exploratory, descriptive, quasi-experimental, experimental, etc.
  • Determine the most appropriate evaluation methods to answer the evaluation questions: qualitative, quantitative, or mixed.
  • Determine evaluation procedures.
  • Conduct evaluations ethically and responsibly.
  • Develop, or find existing and available quantitative data collection instruments including tests, surveys, and scales if quantitative data is going to be collected.
  • Develop, or find existing and available qualitative data collection instruments including: content analysis protocols, focus group protocols, interview protocols, and observation protocols if qualitative data is going to be collected.
  • Determine potential invalidities/errors and the ways to control for them through either design or statistical analysis.
  • Determine target population/s.
  • Determine the sampling frame, sample size, and sampling strategy, if necessary.
  • Protect human research subjects.
  • Successfully navigate the Institutional Review Board process, if necessary.
  • Obtain informed consent from human research subjects.
  • Provide instructions for administering data collection instruments.
  • Implement quantitative data collection instruments.
  • Implement qualitative data collection instruments.
  • Store and secure both quantitative and qualitative data.
  • Clean quantitative data.
  • Provide different approaches to quantitative data analysis based on data and evaluation questions.
  • Provide different approaches to qualitative data analysis based on data and evaluation questions.
  • Interpret the results of the quantitative or qualitative data analysis.
  • Make sense of the results of the quantitative or qualitative data analysis.
  • Make recommendations based on the results of the quantitative or qualitative data analysis.
  • Implement recommendations.
  • Strategically learn from the results of the evaluation.”

Let’s see what this first prompt yielded remembering what I asked it to do. As a caveat, with ChatGPT or similar you should check to make sure that what you wanted it to do, it did, otherwise you will need to revise (and revise) the prompt until you get the results you’re looking for, it is not infallible. If you’re not initially clear on what you want it to do then it’s not going to give you what you really want it to do, if that makes sense. You could probably pull all of this out of any evaluation text, but the beauty of AI is how efficiently it gets the job done.

GPT 1

“I've created a full General-Purpose Tool (GPT) for in-house evaluation tailored to non-profit, not-for-profit, and NGO staff without formal evaluation training. You can now view, edit, and build on it directly in the canvas. Let me know if you want any section expanded—such as adding example evaluation questions, specific survey tools, or walk-throughs of data analysis steps.”                                              

General-Purpose Tool (GPT) for In-House Evaluation in Non-Profit, Not-for-Profit, and NGO Settings

Purpose

This general-purpose tool (GPT) is designed for staff in non-profit, not-for-profit, and non-governmental organizations (NGOs) who want to independently conduct evaluations of their programs, projects, or organizational processes. It assumes no formal training in evaluation and provides a step-by-step, self-contained guide to build internal capacity and ensure responsible, ethical, and rigorous evaluations.

Step 1: Planning and Preparation

1. Allocate Resources

  • Identify evaluation lead(s) and supportive team members.
  • Dedicate time in the project cycle for evaluation (before, during, and after).
  • Estimate and secure budget for data collection tools, software, and potential participant incentives.

2. Define the Focus

  • Identify the phenomenon to evaluate (e.g., service effectiveness, stakeholder engagement).
  • Clarify what the organization wants to learn.
  • Use brainstorming, logic models, or Theory of Change frameworks to outline assumptions and goals.

3. Conduct a Literature Review

  • Search academic databases (Google Scholar, JSTOR, ERIC) and grey literature (NGO reports, government evaluations, and toolkits).
  • Summarize key findings that align with your context.

4. Formulate Evaluation Questions and Hypotheses

Example questions

  • To what extent did our mentoring program improve youth outcomes?
  • What were participant perceptions of our service model?
  • Optional: Form testable hypotheses (e.g., Participants receiving X will show greater Y than those not receiving X)

 Step 2: Design the Evaluation

5. Select an Evaluation Design

  • Exploratory (no clear outcome defined, early-stage understanding)
  • Descriptive (understand what is happening)
  • Quasi-experimental (compare groups without random assignment)
  • Experimental (randomized control trials)

6. Select Data Collection Methods

  • Quantitative: Surveys, tests, scales (e.g., Likert-style measures)
  • Qualitative: Interviews, focus groups, open-ended surveys, observations
  • Mixed methods: Combines both types

7. Identify Ethical Considerations

  • Ensure anonymity and confidentiality
  • Avoid coercion, especially in vulnerable populations
  • Prepare a simple informed consent form

8. Develop/Adapt Instruments

  • Use validated instruments when available (search tools from peer-reviewed studies or evaluation repositories)
  • If developing new tools, pilot-test them

Types

  • Quantitative: Scales (e.g., Rosenberg Self-Esteem Scale), custom surveys
  • Qualitative: Interview guides, focus group questions, observation checklists

9. Determine Sampling Strategy

  • Define target population (e.g., all program participants, volunteers, stakeholders)
  • Choose sampling strategy: random, purposive, convenience
  • Calculate appropriate sample size based on study goals

Step 3: Implementation

10. Secure Ethical Approval (if needed)

  • If publishing externally or involving sensitive populations, seek IRB review
  • Use online platforms (e.g., IRBNet) or consult local university IRBs

11. Obtain Informed Consent

  • Explain purpose, risks, benefits, and rights
  • Use simple language; provide both verbal and written formats

12. Administer Data Collection Tools

  • For surveys: distribute electronically or in person
  • For interviews/focus groups: train facilitators; record (with consent)
  • Store responses in encrypted, password-protected folder

13. Data Cleaning and Storage

  • Quantitative: Remove incomplete or invalid responses
  • Qualitative: Transcribe recordings; remove identifying information
  • Store securely with access limited to evaluation team

Step 4: Data Analysis

14. Analyze Quantitative Data

  • Use tools like Excel, SPSS, or R
  • Common analyses

  1. Descriptive statistics (frequencies, means)
  2. Inferential statistics (t-tests, ANOVA, correlations)
  3. Visualize data with graphs and tables

15. Analyze Qualitative Data

  • Code responses into themes
  • Use thematic or content analysis
  • Software (optional): NVivo, Dedoose, MAXQDA

16. Address Potential Errors/Invalidities

  • Review for selection bias, recall bias, and data entry errors
  • Use triangulation (multiple sources/methods) to enhance credibility

Step 5: Interpretation and Use

17. Interpret Findings

  • Link results back to evaluation questions and hypotheses
  • Note limitations
  • Use quotes (qualitative) or key stats (quantitative) to illustrate findings 

18. Make Recommendations

  • Develop actionable, feasible suggestions based on data
  • Prioritize recommendations by impact and feasibility

19. Implement and Learn

  • Share findings with staff and stakeholders
  • Integrate insights into strategic planning
  • Schedule follow-up evaluations to assess changes

Resources & Templates (recommended for adaptation):

  • Logic Model Template (Kellogg Foundation)
  • Sample Informed Consent Form (CDC, WHO)
  • Evaluation Plan Template (CDC’s Framework for Program Evaluation)
  • IRB Navigator (for determining review needs)
  • Survey and Interview Protocol Repositories (e.g., Measurement Instrument Database for the Social Sciences)

Final Note

This GPT is a living document. Organizations are encouraged to adapt it, build in community voice, and revisit evaluation practices regularly to improve learning, accountability, and impact.

So, is this what I really wanted? No, not exactly, it’s just a litany of steps. It’s potentially useful for an organization that wants to begin doing its own evaluations, it’s a point of departure, but it doesn’t actually go far enough in terms of detail and explanation, it’s nothing more than a framework, a guide, it could be found in any evaluation text as noted, and you’d likely need an evaluator or at least someone with a modicum of evaluation training to provide technical assistance. It’s helpful but not in the way I had hoped it would be helpful. What I want is for an organization staff member to say: “evaluate the effectiveness/process/outcomes/impacts of this program/project, etc.,” have said staff person add in some pertinent information, voila, the evaluation is completed in short order. ChatGPT did, however, say “let me know if you want any section expanded—such as adding example evaluation questions, specific survey tools, or walk-throughs of data analysis steps.” Why not? Here goes:

Prompt 2

Please add to the In House Evaluation GPT instructions for how to carry out each step, provide examples for each element of each step, provide specific tools for each step as necessary, and do a walk-through of the steps of a basic quantitative data analysis and the steps of a basic qualitative data analysis.

Let’s see what we got.

GPT 2

Oops, crickets. Let’s try that again with more clarifying instructions. Hope this gets us there, or at least closer…

Prompt 3

The In House Evaluation GPT is good, but an organization needs step-by-step instructions to carry out each step including examples. Please add to the In House Evaluation GPT instructions to carry out steps 1, 2, 3,4,5,6,7,8,9,10,11,12,13,14,15,16,17, 18, and 19. Provide examples for each element of steps 1, 2, 3,4,5,6,7,8,9,10,11,12,13,14,15,16, 17, 18, and 19. Provide specific tools for steps 1, 2, 3,4,5,6,7,8,9,10,11,12,13,14,15,16,17, 18, and 19. Do a walk-through of the steps of a basic quantitative data analysis and a basic qualitative data analysis.   

I really hope the third GPT is better than the last. The question: was I clear enough in what I asked ChatGPT to do? Let’s take a look.

GPT 3

“The In-House Evaluation GPT has been expanded with detailed instructions, real-world examples, and specific tools for Steps 1 through 19. It also now includes walk-throughs for basic quantitative and qualitative data analysis. You can view and continue editing it in the canvas. Let me know if you'd like templates, sample instruments, or walk-throughs in downloadable formats.”

General-Purpose Tool (GPT) for In-House Evaluation in Non-Profit, Not-for-Profit, and NGO Settings

Purpose

This general-purpose tool (GPT) is designed for staff in non-profit, not-for-profit, and non-governmental organizations (NGOs) who want to independently conduct evaluations of their programs, projects, or organizational processes. It assumes no formal training in evaluation and provides a step-by-step, self-contained guide to build internal capacity and ensure responsible, ethical, and rigorous evaluations.

Step-by-Step Evaluation Process with Instructions, Examples, and Tools

Step 1: Allocate Resources

Instructions

  • Identify who will lead the evaluation and any supporting team members.
  • Determine how many hours per week each person can dedicate.
  • Estimate budget for materials, software, training, and incentives.

Example

  • Assign program coordinator 10 hours/week and intern 5 hours/week.
  • Budget: $500 for survey software, $200 for incentives.

Tools

  • Resource Allocation Worksheet (Excel or Google Sheets)
  • Time Tracking Template (Toggl or Clockify)

Step 2: Define the Focus

Instructions

  • Identify the main topic or phenomenon to evaluate.
  • Ask: What problem are we addressing? What do we want to improve or understand?

Example

  • Phenomenon: Youth engagement in after-school programs
  • Goal: Understand how participation affects leadership skills 

Tools

  • Problem Tree Analysis
  • Theory of Change Template (UNDP or W.K. Kellogg Foundation)

Step 3: Conduct a Literature Review

Instructions

  • Search for existing studies using keywords related to your topic.
  • Look for both academic (Google Scholar, JSTOR) and grey literature (government reports, NGO case studies)

Example

  • Keywords: "youth engagement", "after-school program evaluation"
  • Found: A USAID toolkit on youth program impact 

Tools

  • Zotero or Mendeley for citation management
  • Evaluation Hub’s Grey Literature Repository 

Step 4: Formulate Evaluation Questions and Hypotheses

Instructions

  • Convert your goals into 1–3 clear questions.
  • Optional: State hypotheses if comparing groups or anticipating specific outcomes.

Example

  • Q1: To what extent does participation increase youth leadership skills?
  • Hypothesis: Youth who attend at least 10 sessions will show higher leadership scores.

Tools

  • Evaluation Question Worksheet
  • SMART Criteria Checklist

Step 5: Select an Evaluation Design

Instructions

Decide based on resources and purpose:

  • Exploratory: Little existing knowledge
  • Descriptive: Who, what, when, how often?
  • Quasi-experimental: Compare non-random groups
  • Experimental: Random assignment

Example

  • Chose quasi-experimental: compare students who opt in vs. opt out of a leadership module.

Tools                                         

  • Evaluation Design Matrix Template
  • CDC Evaluation Design Checklist 

Step 6: Select Data Collection Methods

Instructions

Match method to question:

  • Surveys for attitudes or frequency
  • Interviews/focus groups for deeper meaning

Example

  • Survey: leadership self-assessment
  • Interviews: youth perspectives on what helps them grow 

Tools

Step 7: Identify Ethical Considerations

Instructions

  • Minimize risk; protect identity
  • Include a clear informed consent process

Example

  • Assure participants data is anonymous and voluntary

Tools

  • Sample Informed Consent Form (WHO)
  • Ethical Evaluation Checklist (AEA or UNICEF)

Step 8: Develop or Find Instruments

Instructions

  • Adapt from validated tools if available
  • Pilot-test with 3–5 people

Example

  • Adapted the Youth Leadership Life Skills Development Scale (See: Robinson & Kolb, 2010)

Tools

  • Measurement Instrument Database for the Social Sciences (MIDSS)
  • Qualitative Instrument Guide (Harvard Family Research Project) 

Step 9: Determine Sampling Strategy

Instructions

  • Define who should participate (age, role, geography, etc.)
  • Choose strategy: random, purposive, convenience
  • Estimate needed sample size using online calculator

Example

  • Target: High school youth in 3 after-school programs
  • Sampling: Purposive
  • Sample size: 60 (calculated for confidence level 95%)

Tools

  • Sample Size Calculator (Qualtrics or Raosoft)
  • Sampling Strategy Guide

Step 10: Secure Ethical Approval

Instructions

  • If research involves risk, get IRB review
  • Universities or third-party IRBs may assist

Example

  • Applied to a university IRB using their community research protocol form

Tools

  • IRBNet or local university IRB portals
  • IRB Application Checklist

Step 11: Obtain Informed Consent

Instructions

  • Explain purpose, procedures, risks, and benefits
  • Use simple language; collect signatures or verbal consent

Example

  • “Your answers will help us improve our programs. You can skip any question.”

Tools

  • Informed Consent Script
  • Consent Form Template (CDC)

Step 12: Administer Data Collection Tools

Instructions

  • Decide how to distribute (in-person, email, online)
  • Provide clear instructions

Example

  • Administered surveys via tablets during program sessions

Tools

  • Google Forms or KoboToolbox
  • Interviewer Instruction Sheet

Step 13: Store and Clean Data

Instructions

  • Backup data securely (cloud and offline)
  • Clean quantitative data by checking for duplicates, blanks, and outliers

Example

  • Found 4 surveys with missing IDs; removed them

Tools

  • Excel or Google Sheets
  • SPSS, R, or Python (optional for large datasets)

Step 14: Analyze Quantitative Data

Instructions

  • Run descriptive stats: mean, median, frequency
  • Run inferential stats if comparing groups (t-test, chi-square)

Example Walk-Through

Q: Does leadership score differ by attendance?

  • Step 1: Calculate average scores by group
  • Step 2: Run t-test using Excel or SPSS
  • Step 3: Result: Group A = 4.2, Group B = 3.5 (p < .05)

Tools

  • Excel, SPSS, or R
  • Free online t-test calculator (GraphPad)

Step 15: Analyze Qualitative Data

Instructions

  • Transcribe interviews/focus groups
  • Code responses into themes (manually or using software)

Example Walk-Through

  • Step 1: Read 10 transcripts
  • Step 2: Code mentions of "growth" or "confidence"
  • Step 3: Create summary of common themes with quotes

Tools

  • Manual coding template (Word/Excel)
  • NVivo or Dedoose (optional)

Step 16: Address Potential Errors/Invalidities

Instructions

  • Identify sources of error (bias, measurement flaws)
  • Use triangulation (data/method/personnel) for credibility

Example

  • Used both surveys and interviews to assess engagement

Tools

  • Validity Threats Checklist (Trochim)
  • Triangulation Planning Worksheet Step 17: Interpret Findings

Instructions

  • Return to original questions
  • Compare findings with literature and logic model

Example

  • “Youth who participated more often reported higher confidence, aligning with Robinson’s model (2010).”

Tools

  • Interpretation Worksheet
  • Summary Report Template

Step 18: Make Recommendations

Instructions

  • Create 2–4 practical recommendations
  • Prioritize based on impact and feasibility 

Example

  • Offer leadership program more frequently; assign mentors

Tools

  • Recommendation Matrix
  • Action Planning Template (Community Tool Box)

Step 19: Implement and Learn

Instructions

  • Share results internally and externally
  • Integrate learnings into planning
  • Schedule next evaluation

Example

  • Shared report with board and local schools; updated curriculum for next year

Tools

  • Evaluation Utilization Checklist
  • Learning Log Template

Final Note

This GPT is a living document. Organizations are encouraged to adapt it, build in community voice, and revisit evaluation practices regularly to improve learning, accountability, and impact.

So, again, is this what I really wanted? No, not exactly, it’s still a litany of steps with the addition of examples and tools, and like GPT 1 potentially useful for an organization that wants to begin doing its own evaluations with technical assistance from an evaluator or at least someone with a modicum of evaluation training. The point, however, is to eventually get away from evaluators and technical assistants and neither GPT actually gets us there. I put together a partial side-by-side comparison of GPT 1 and GPT 3 so you can judge whether GPT 3 is an improvement:

Article content

To me, GPT 3 doesn’t provide enough detail and explanation. While the expanded In House Evaluation GPT is better, an organization will need even more detailed instructions to carry out each step beyond simply being told what to do with a few measly examples and tools. For example, in Step 19 GPT 3 says to share results internally and externally, integrate learnings into planning, and schedule next evaluation. Okay, so how does an organization share results internally and externally, how does an organization integrate learning into planning, and how does an organization schedule the next evaluation? The examples are, at best, vague and need more context and detailed explanation if they are going to be used as guidance. Regarding the tools, for the next GPT a detailed explanation should be provided in terms of how a tools is used, when its used, an example or two of how it’s been used by other organizations, how to find the tools, and instructions for their use  It’s helpful but not in the way I envision it being helpful.

And, that concludes Part I. In Part II I am going to attempt one more GPT in this series and then move into creating a GPT for an actual program or project evaluation, basing its construction on actual program data. Wish me luck. Thanks for reading. As always I encourage your thoughts and comments.


To view or add a comment, sign in

Others also viewed

Explore content categories