Evaluation disrupted Part I: It’s not if AI, it’s when AI, and that when is now… Dr. Jeffrey Sheldon

Introduction

As the title says, AI is here, it’s a disruptive innovation, and it’s already coming to disrupt evaluation apace whether we want it to or not. I’ve written about AI’s role in research and evaluation in previous articles but contrary to my earlier belief that AI and the social science research methods which provide the scaffolding for this work are complimentary, I now believe AI will wholly replace it by eventually by cutting out the evaluator as we at present know them to be. As an aside, if you’re contemplating a graduate degree in evaluation I would advise against it. Your best bet would be a certificate so you understand evaluation fundamentals then go get yourself some sort of technology training or advanced degree because that’s where the game’s at and going to be for the foreseeable future; I think you’ll be much better off and happier for it. Why? I predict those of us who already have the graduate-level education and training, and experience to conduct evaluations are going to be out of a job sooner rather than later once the organizations we work with, or who we might work with, figure out they can use AI to do their own evaluations at a fraction of the cost. Who needs an evaluator when technology can do the job more efficiently, more effectively, and with, to put it in economic terms, fewer externalities? Plus, is not a vertical organizational structure that relies on in-house capacity better than a horizontal organizational structure that requires all manner of contractors and consultants? Optimally, if we’re being honest, we should be working our way out of our roles as evaluators anyway by building then handing capacity over to organizations. Heresy, I know. Besides, given that many organizations use external evaluators unless they have deep enough pockets to have their own in-house person, in this moment of economic contraction there will likely be less money available for evaluation so an organization’s ability to conduct its own using a GPT just makes sense. Again, why hire an external evaluator or keep an internal evaluator on payroll when an organization has the capacity, or soon will have the capacity, to conduct its own evaluations? That being said, my other prediction is that if we want to have any semblance of evaluation work in the future then our new role will have to be in providing technical assistance to organizations so they can set up their own GPTs to conduct their own evaluation in toto. In essence, an evaluator or researcher might assist in the writing of the GPT as a point of departure, but the organization staff members will do the rest. So, out of curiosity, I decided to test out my first prediction using two sample GPTs I created framed by this basic question: what might non-profit organizations (NPOs), not-for-profit organizations (NFPOs), and non-governmental organizations (NGOs) want AI to do in terms of self-guided evaluation of their programs, projects, or organizational functions and processes? Below is the first prompt I put into ChatGPT 4.o, the results follow. It took more than a couple of tries to provide a GPT that was remotely close to what I was trying to create, and to be honest I don’t think I quite got there in the end. Oh well, it’s a process and this is only Part I. This is a long piece, settle in, read on.

Prompt 1

I need a GPT for a non-profit organization, not-for-profit organization, or non-government organization - regardless of size, location, or mission - so it can conduct its own program, project, or organizational function and process evaluations from start to finish without the assistance of either an internal or external evaluator. This GPT should be for people in those three types of organizations who want to evaluate their work but don't necessarily have any formal training or experience in conducting evaluations. Basically, this would be a stand-alone way for organization staff members to conduct their own, in-house evaluations. This GPT should help organizational staff members to be able to:

Allocate the necessary time, resources, personnel, and budget to conduct evaluations.
Determine the phenomenon/phenomena they want to evaluate.
Determine what they want to learn from the evaluation.
Conduct academic and grey literature reviews as the basis for their evaluations, if necessary.
Formulate evaluation questions based on either the academic or grey literature, or those that are relevant to the organization’s projects or programs.
Formulate testable hypotheses, as appropriate.
Determine the most appropriate evaluation design to answer the evaluation questions including exploratory, descriptive, quasi-experimental, experimental, etc.
Determine the most appropriate evaluation methods to answer the evaluation questions: qualitative, quantitative, or mixed.
Determine evaluation procedures.
Conduct evaluations ethically and responsibly.
Develop, or find existing and available quantitative data collection instruments including tests, surveys, and scales if quantitative data is going to be collected.
Develop, or find existing and available qualitative data collection instruments including: content analysis protocols, focus group protocols, interview protocols, and observation protocols if qualitative data is going to be collected.
Determine potential invalidities/errors and the ways to control for them through either design or statistical analysis.
Determine target population/s.
Determine the sampling frame, sample size, and sampling strategy, if necessary.
Protect human research subjects.
Successfully navigate the Institutional Review Board process, if necessary.
Obtain informed consent from human research subjects.
Provide instructions for administering data collection instruments.
Implement quantitative data collection instruments.
Implement qualitative data collection instruments.
Store and secure both quantitative and qualitative data.
Clean quantitative data.
Provide different approaches to quantitative data analysis based on data and evaluation questions.
Provide different approaches to qualitative data analysis based on data and evaluation questions.
Interpret the results of the quantitative or qualitative data analysis.
Make sense of the results of the quantitative or qualitative data analysis.
Make recommendations based on the results of the quantitative or qualitative data analysis.
Implement recommendations.
Strategically learn from the results of the evaluation.”

Let’s see what this first prompt yielded remembering what I asked it to do. As a caveat, with ChatGPT or similar you should check to make sure that what you wanted it to do, it did, otherwise you will need to revise (and revise) the prompt until you get the results you’re looking for, it is not infallible. If you’re not initially clear on what you want it to do then it’s not going to give you what you really want it to do, if that makes sense. You could probably pull all of this out of any evaluation text, but the beauty of AI is how efficiently it gets the job done.

GPT 1

“I've created a full General-Purpose Tool (GPT) for in-house evaluation tailored to non-profit, not-for-profit, and NGO staff without formal evaluation training. You can now view, edit, and build on it directly in the canvas. Let me know if you want any section expanded—such as adding example evaluation questions, specific survey tools, or walk-throughs of data analysis steps.”

General-Purpose Tool (GPT) for In-House Evaluation in Non-Profit, Not-for-Profit, and NGO Settings

Purpose

This general-purpose tool (GPT) is designed for staff in non-profit, not-for-profit, and non-governmental organizations (NGOs) who want to independently conduct evaluations of their programs, projects, or organizational processes. It assumes no formal training in evaluation and provides a step-by-step, self-contained guide to build internal capacity and ensure responsible, ethical, and rigorous evaluations.

Step 1: Planning and Preparation

1. Allocate Resources

Identify evaluation lead(s) and supportive team members.
Dedicate time in the project cycle for evaluation (before, during, and after).
Estimate and secure budget for data collection tools, software, and potential participant incentives.

2. Define the Focus

Identify the phenomenon to evaluate (e.g., service effectiveness, stakeholder engagement).
Clarify what the organization wants to learn.
Use brainstorming, logic models, or Theory of Change frameworks to outline assumptions and goals.

3. Conduct a Literature Review

Search academic databases (Google Scholar, JSTOR, ERIC) and grey literature (NGO reports, government evaluations, and toolkits).
Summarize key findings that align with your context.

4. Formulate Evaluation Questions and Hypotheses

Example questions

To what extent did our mentoring program improve youth outcomes?
What were participant perceptions of our service model?
Optional: Form testable hypotheses (e.g., Participants receiving X will show greater Y than those not receiving X)

Step 2: Design the Evaluation

5. Select an Evaluation Design

Exploratory (no clear outcome defined, early-stage understanding)
Descriptive (understand what is happening)
Quasi-experimental (compare groups without random assignment)
Experimental (randomized control trials)

6. Select Data Collection Methods

Quantitative: Surveys, tests, scales (e.g., Likert-style measures)
Qualitative: Interviews, focus groups, open-ended surveys, observations
Mixed methods: Combines both types

7. Identify Ethical Considerations

Ensure anonymity and confidentiality
Avoid coercion, especially in vulnerable populations
Prepare a simple informed consent form

8. Develop/Adapt Instruments

Use validated instruments when available (search tools from peer-reviewed studies or evaluation repositories)
If developing new tools, pilot-test them

Types

Quantitative: Scales (e.g., Rosenberg Self-Esteem Scale), custom surveys
Qualitative: Interview guides, focus group questions, observation checklists

9. Determine Sampling Strategy

Define target population (e.g., all program participants, volunteers, stakeholders)
Choose sampling strategy: random, purposive, convenience
Calculate appropriate sample size based on study goals

Step 3: Implementation

10. Secure Ethical Approval (if needed)

If publishing externally or involving sensitive populations, seek IRB review
Use online platforms (e.g., IRBNet) or consult local university IRBs

11. Obtain Informed Consent

Explain purpose, risks, benefits, and rights
Use simple language; provide both verbal and written formats

12. Administer Data Collection Tools

For surveys: distribute electronically or in person
For interviews/focus groups: train facilitators; record (with consent)
Store responses in encrypted, password-protected folder

13. Data Cleaning and Storage

Quantitative: Remove incomplete or invalid responses
Qualitative: Transcribe recordings; remove identifying information
Store securely with access limited to evaluation team

Step 4: Data Analysis

14. Analyze Quantitative Data

Use tools like Excel, SPSS, or R
Common analyses

Descriptive statistics (frequencies, means)
Inferential statistics (t-tests, ANOVA, correlations)
Visualize data with graphs and tables

15. Analyze Qualitative Data

Code responses into themes
Use thematic or content analysis
Software (optional): NVivo, Dedoose, MAXQDA

16. Address Potential Errors/Invalidities

Review for selection bias, recall bias, and data entry errors
Use triangulation (multiple sources/methods) to enhance credibility

Step 5: Interpretation and Use

17. Interpret Findings

Link results back to evaluation questions and hypotheses
Note limitations
Use quotes (qualitative) or key stats (quantitative) to illustrate findings

18. Make Recommendations

Develop actionable, feasible suggestions based on data
Prioritize recommendations by impact and feasibility

19. Implement and Learn

Share findings with staff and stakeholders
Integrate insights into strategic planning
Schedule follow-up evaluations to assess changes

Resources & Templates (recommended for adaptation):

Logic Model Template (Kellogg Foundation)
Sample Informed Consent Form (CDC, WHO)
Evaluation Plan Template (CDC’s Framework for Program Evaluation)
IRB Navigator (for determining review needs)
Survey and Interview Protocol Repositories (e.g., Measurement Instrument Database for the Social Sciences)

Final Note

This GPT is a living document. Organizations are encouraged to adapt it, build in community voice, and revisit evaluation practices regularly to improve learning, accountability, and impact.

So, is this what I really wanted? No, not exactly, it’s just a litany of steps. It’s potentially useful for an organization that wants to begin doing its own evaluations, it’s a point of departure, but it doesn’t actually go far enough in terms of detail and explanation, it’s nothing more than a framework, a guide, it could be found in any evaluation text as noted, and you’d likely need an evaluator or at least someone with a modicum of evaluation training to provide technical assistance. It’s helpful but not in the way I had hoped it would be helpful. What I want is for an organization staff member to say: “evaluate the effectiveness/process/outcomes/impacts of this program/project, etc.,” have said staff person add in some pertinent information, voila, the evaluation is completed in short order. ChatGPT did, however, say “let me know if you want any section expanded—such as adding example evaluation questions, specific survey tools, or walk-throughs of data analysis steps.” Why not? Here goes:

Prompt 2

Please add to the In House Evaluation GPT instructions for how to carry out each step, provide examples for each element of each step, provide specific tools for each step as necessary, and do a walk-through of the steps of a basic quantitative data analysis and the steps of a basic qualitative data analysis.

Let’s see what we got.

GPT 2

Oops, crickets. Let’s try that again with more clarifying instructions. Hope this gets us there, or at least closer…

Prompt 3

The In House Evaluation GPT is good, but an organization needs step-by-step instructions to carry out each step including examples. Please add to the In House Evaluation GPT instructions to carry out steps 1, 2, 3,4,5,6,7,8,9,10,11,12,13,14,15,16,17, 18, and 19. Provide examples for each element of steps 1, 2, 3,4,5,6,7,8,9,10,11,12,13,14,15,16, 17, 18, and 19. Provide specific tools for steps 1, 2, 3,4,5,6,7,8,9,10,11,12,13,14,15,16,17, 18, and 19. Do a walk-through of the steps of a basic quantitative data analysis and a basic qualitative data analysis.

I really hope the third GPT is better than the last. The question: was I clear enough in what I asked ChatGPT to do? Let’s take a look.

GPT 3

“The In-House Evaluation GPT has been expanded with detailed instructions, real-world examples, and specific tools for Steps 1 through 19. It also now includes walk-throughs for basic quantitative and qualitative data analysis. You can view and continue editing it in the canvas. Let me know if you'd like templates, sample instruments, or walk-throughs in downloadable formats.”

General-Purpose Tool (GPT) for In-House Evaluation in Non-Profit, Not-for-Profit, and NGO Settings

Purpose

This general-purpose tool (GPT) is designed for staff in non-profit, not-for-profit, and non-governmental organizations (NGOs) who want to independently conduct evaluations of their programs, projects, or organizational processes. It assumes no formal training in evaluation and provides a step-by-step, self-contained guide to build internal capacity and ensure responsible, ethical, and rigorous evaluations.

Step-by-Step Evaluation Process with Instructions, Examples, and Tools

Step 1: Allocate Resources

Instructions

Identify who will lead the evaluation and any supporting team members.
Determine how many hours per week each person can dedicate.
Estimate budget for materials, software, training, and incentives.

Example

Assign program coordinator 10 hours/week and intern 5 hours/week.
Budget: $500 for survey software, $200 for incentives.

Tools

Resource Allocation Worksheet (Excel or Google Sheets)
Time Tracking Template (Toggl or Clockify)

Step 2: Define the Focus

Instructions

Identify the main topic or phenomenon to evaluate.
Ask: What problem are we addressing? What do we want to improve or understand?

Example

Phenomenon: Youth engagement in after-school programs
Goal: Understand how participation affects leadership skills

Tools

Problem Tree Analysis
Theory of Change Template (UNDP or W.K. Kellogg Foundation)

Step 3: Conduct a Literature Review

Instructions

Search for existing studies using keywords related to your topic.
Look for both academic (Google Scholar, JSTOR) and grey literature (government reports, NGO case studies)

Example

Keywords: "youth engagement", "after-school program evaluation"
Found: A USAID toolkit on youth program impact

Tools

Zotero or Mendeley for citation management
Evaluation Hub’s Grey Literature Repository

Step 4: Formulate Evaluation Questions and Hypotheses

Instructions

Convert your goals into 1–3 clear questions.
Optional: State hypotheses if comparing groups or anticipating specific outcomes.

Example

Q1: To what extent does participation increase youth leadership skills?
Hypothesis: Youth who attend at least 10 sessions will show higher leadership scores.

Tools

Evaluation Question Worksheet
SMART Criteria Checklist

Step 5: Select an Evaluation Design

Instructions

Decide based on resources and purpose:

Exploratory: Little existing knowledge
Descriptive: Who, what, when, how often?
Quasi-experimental: Compare non-random groups
Experimental: Random assignment

Example

Chose quasi-experimental: compare students who opt in vs. opt out of a leadership module.

Tools

Evaluation Design Matrix Template
CDC Evaluation Design Checklist

Step 6: Select Data Collection Methods

Instructions

Match method to question:

Surveys for attitudes or frequency
Interviews/focus groups for deeper meaning

Example

Survey: leadership self-assessment
Interviews: youth perspectives on what helps them grow

Tools

SurveyMonkey or Google Forms
Interview Guide Template (from BetterEvaluation.org)

Step 7: Identify Ethical Considerations

Instructions

Minimize risk; protect identity
Include a clear informed consent process

Example

Assure participants data is anonymous and voluntary

Tools

Sample Informed Consent Form (WHO)
Ethical Evaluation Checklist (AEA or UNICEF)

Step 8: Develop or Find Instruments

Instructions

Adapt from validated tools if available
Pilot-test with 3–5 people

Example

Adapted the Youth Leadership Life Skills Development Scale (See: Robinson & Kolb, 2010)

Tools

Measurement Instrument Database for the Social Sciences (MIDSS)
Qualitative Instrument Guide (Harvard Family Research Project)

Step 9: Determine Sampling Strategy

Instructions

Define who should participate (age, role, geography, etc.)
Choose strategy: random, purposive, convenience
Estimate needed sample size using online calculator

Example

Target: High school youth in 3 after-school programs
Sampling: Purposive
Sample size: 60 (calculated for confidence level 95%)

Tools

Sample Size Calculator (Qualtrics or Raosoft)
Sampling Strategy Guide

Step 10: Secure Ethical Approval

Instructions

If research involves risk, get IRB review
Universities or third-party IRBs may assist

Example

Applied to a university IRB using their community research protocol form

Tools

IRBNet or local university IRB portals
IRB Application Checklist

Step 11: Obtain Informed Consent

Instructions

Explain purpose, procedures, risks, and benefits
Use simple language; collect signatures or verbal consent

Example

“Your answers will help us improve our programs. You can skip any question.”

Tools

Informed Consent Script
Consent Form Template (CDC)

Step 12: Administer Data Collection Tools

Instructions

Decide how to distribute (in-person, email, online)
Provide clear instructions

Example

Administered surveys via tablets during program sessions

Tools

Google Forms or KoboToolbox
Interviewer Instruction Sheet

Step 13: Store and Clean Data

Instructions

Backup data securely (cloud and offline)
Clean quantitative data by checking for duplicates, blanks, and outliers

Example

Found 4 surveys with missing IDs; removed them

Tools

Excel or Google Sheets
SPSS, R, or Python (optional for large datasets)

Step 14: Analyze Quantitative Data

Instructions

Run descriptive stats: mean, median, frequency
Run inferential stats if comparing groups (t-test, chi-square)

Example Walk-Through

Q: Does leadership score differ by attendance?

Step 1: Calculate average scores by group
Step 2: Run t-test using Excel or SPSS
Step 3: Result: Group A = 4.2, Group B = 3.5 (p < .05)

Tools

Excel, SPSS, or R
Free online t-test calculator (GraphPad)

Step 15: Analyze Qualitative Data

Instructions

Transcribe interviews/focus groups
Code responses into themes (manually or using software)

Example Walk-Through

Step 1: Read 10 transcripts
Step 2: Code mentions of "growth" or "confidence"
Step 3: Create summary of common themes with quotes

Tools

Manual coding template (Word/Excel)
NVivo or Dedoose (optional)

Step 16: Address Potential Errors/Invalidities

Instructions

Identify sources of error (bias, measurement flaws)
Use triangulation (data/method/personnel) for credibility

Example

Used both surveys and interviews to assess engagement

Tools

Validity Threats Checklist (Trochim)
Triangulation Planning Worksheet Step 17: Interpret Findings

Instructions

Return to original questions
Compare findings with literature and logic model

Example

“Youth who participated more often reported higher confidence, aligning with Robinson’s model (2010).”

Tools

Interpretation Worksheet
Summary Report Template

Step 18: Make Recommendations

Instructions

Create 2–4 practical recommendations
Prioritize based on impact and feasibility

Example

Offer leadership program more frequently; assign mentors

Tools

Recommendation Matrix
Action Planning Template (Community Tool Box)

Step 19: Implement and Learn

Instructions

Share results internally and externally
Integrate learnings into planning
Schedule next evaluation

Example

Shared report with board and local schools; updated curriculum for next year

Tools

Evaluation Utilization Checklist
Learning Log Template

Final Note

This GPT is a living document. Organizations are encouraged to adapt it, build in community voice, and revisit evaluation practices regularly to improve learning, accountability, and impact.

So, again, is this what I really wanted? No, not exactly, it’s still a litany of steps with the addition of examples and tools, and like GPT 1 potentially useful for an organization that wants to begin doing its own evaluations with technical assistance from an evaluator or at least someone with a modicum of evaluation training. The point, however, is to eventually get away from evaluators and technical assistants and neither GPT actually gets us there. I put together a partial side-by-side comparison of GPT 1 and GPT 3 so you can judge whether GPT 3 is an improvement:

To me, GPT 3 doesn’t provide enough detail and explanation. While the expanded In House Evaluation GPT is better, an organization will need even more detailed instructions to carry out each step beyond simply being told what to do with a few measly examples and tools. For example, in Step 19 GPT 3 says to share results internally and externally, integrate learnings into planning, and schedule next evaluation. Okay, so how does an organization share results internally and externally, how does an organization integrate learning into planning, and how does an organization schedule the next evaluation? The examples are, at best, vague and need more context and detailed explanation if they are going to be used as guidance. Regarding the tools, for the next GPT a detailed explanation should be provided in terms of how a tools is used, when its used, an example or two of how it’s been used by other organizations, how to find the tools, and instructions for their use It’s helpful but not in the way I envision it being helpful.

And, that concludes Part I. In Part II I am going to attempt one more GPT in this series and then move into creating a GPT for an actual program or project evaluation, basing its construction on actual program data. Wish me luck. Thanks for reading. As always I encourage your thoughts and comments.

LinkedIn respects your privacy

Evaluation disrupted Part I: It’s not if AI, it’s when AI, and that when is now… Dr. Jeffrey Sheldon

Jeffrey Sheldon, Ed.M., Ph.D.

Social Scientist: Applied Research, Evaluation, and Learning | Project Manager | Educator | Technical Assistant | Coach | Data Analyst | Peer Reviewer/Editor | RFP Proposal Developer/Grant Writer | Author | Leader

More articles by this author

Others also viewed

Almost Timely News: What Could Go Wrong With AI? (2023-03-26)

Is AI Making Us All Dumber?

The implications of AI on 3 industries: Education, Logistics, Healthcare

Today's Prompt: Use AI to Debate Both Sides of a Current Event

Understand The Difference Between Explicit and Tacit Knowledge. AI Will Automate Both

The Human Edge: Mastering non-AI Human Skills to Win in the Age of AI

"Will AI take my job away?" - An opinion piece on AI & the future of work

The Future of AI in Field Service: Self-Evolving Models and Their Impact

Let's talk about AI

The Four Rules of AI: Essential Insights from a Must-Read Book

Explore content categories

Evaluation disrupted Part V – A Self-Guided Self-Evaluation: It’s not if AI, it’s when AI, and that when is now… Dr. Jeff Sheldon

Sep 9, 2025

Evaluation disrupted Part IV – The Evaluation Plan for a Self-Guided Evaluation: It’s not if AI, it’s when AI, and that when is now… Dr. Jeff Sheldon

Sep 6, 2025

Evaluation disrupted Part III - Summary: It’s not if AI, it’s when AI, and that when is now… Dr. Jeff Sheldon

Sep 4, 2025

Evaluation disrupted Part II – Evaluation Scenario and Logic Model: It’s not if AI, it’s when AI, and that when is now… Dr. Jeff Sheldon

Sep 4, 2025

Dr. Jeff Sheldon on Succeeding at Failure in Applied Research and Evaluation

Mar 29, 2025

Dr. Jeff Sheldon discusses various topics from a LSE public policy analysis course: the impact of decentralisation and electoral incentives

Sep 17, 2024

Dr. Jeff Sheldon discusses various topics from a London School of Economics public policy analysis course: Denmark’s constitutional system.

Sep 11, 2024

Dr. Jeff Sheldon discusses various topics from a London School of Economics public policy analysis course: policy recommendations to fight corruption

Sep 4, 2024

Dr. Jeff Sheldon discusses various topics from a London School of Economics public policy analysis course: impact of interest groups on public policy

Aug 27, 2024

Dr. Jeff Sheldon discusses various topics from a London School of Economics public policy analysis course: political accountability and representation

Aug 21, 2024

Others also viewed

Almost Timely News: What Could Go Wrong With AI? (2023-03-26)

Is AI Making Us All Dumber?

The implications of AI on 3 industries: Education, Logistics, Healthcare

Today's Prompt: Use AI to Debate Both Sides of a Current Event

Understand The Difference Between Explicit and Tacit Knowledge. AI Will Automate Both

The Human Edge: Mastering non-AI Human Skills to Win in the Age of AI

"Will AI take my job away?" - An opinion piece on AI & the future of work

The Future of AI in Field Service: Self-Evolving Models and Their Impact

Let's talk about AI

The Four Rules of AI: Essential Insights from a Must-Read Book

Explore content categories