Evaluation disrupted Part I: It’s not if AI, it’s when AI, and that when is now… Dr. Jeffrey Sheldon
Introduction
As the title says, AI is here, it’s a disruptive innovation, and it’s already coming to disrupt evaluation apace whether we want it to or not. I’ve written about AI’s role in research and evaluation in previous articles but contrary to my earlier belief that AI and the social science research methods which provide the scaffolding for this work are complimentary, I now believe AI will wholly replace it by eventually by cutting out the evaluator as we at present know them to be. As an aside, if you’re contemplating a graduate degree in evaluation I would advise against it. Your best bet would be a certificate so you understand evaluation fundamentals then go get yourself some sort of technology training or advanced degree because that’s where the game’s at and going to be for the foreseeable future; I think you’ll be much better off and happier for it. Why? I predict those of us who already have the graduate-level education and training, and experience to conduct evaluations are going to be out of a job sooner rather than later once the organizations we work with, or who we might work with, figure out they can use AI to do their own evaluations at a fraction of the cost. Who needs an evaluator when technology can do the job more efficiently, more effectively, and with, to put it in economic terms, fewer externalities? Plus, is not a vertical organizational structure that relies on in-house capacity better than a horizontal organizational structure that requires all manner of contractors and consultants? Optimally, if we’re being honest, we should be working our way out of our roles as evaluators anyway by building then handing capacity over to organizations. Heresy, I know. Besides, given that many organizations use external evaluators unless they have deep enough pockets to have their own in-house person, in this moment of economic contraction there will likely be less money available for evaluation so an organization’s ability to conduct its own using a GPT just makes sense. Again, why hire an external evaluator or keep an internal evaluator on payroll when an organization has the capacity, or soon will have the capacity, to conduct its own evaluations? That being said, my other prediction is that if we want to have any semblance of evaluation work in the future then our new role will have to be in providing technical assistance to organizations so they can set up their own GPTs to conduct their own evaluation in toto. In essence, an evaluator or researcher might assist in the writing of the GPT as a point of departure, but the organization staff members will do the rest. So, out of curiosity, I decided to test out my first prediction using two sample GPTs I created framed by this basic question: what might non-profit organizations (NPOs), not-for-profit organizations (NFPOs), and non-governmental organizations (NGOs) want AI to do in terms of self-guided evaluation of their programs, projects, or organizational functions and processes? Below is the first prompt I put into ChatGPT 4.o, the results follow. It took more than a couple of tries to provide a GPT that was remotely close to what I was trying to create, and to be honest I don’t think I quite got there in the end. Oh well, it’s a process and this is only Part I. This is a long piece, settle in, read on.
Prompt 1
I need a GPT for a non-profit organization, not-for-profit organization, or non-government organization - regardless of size, location, or mission - so it can conduct its own program, project, or organizational function and process evaluations from start to finish without the assistance of either an internal or external evaluator. This GPT should be for people in those three types of organizations who want to evaluate their work but don't necessarily have any formal training or experience in conducting evaluations. Basically, this would be a stand-alone way for organization staff members to conduct their own, in-house evaluations. This GPT should help organizational staff members to be able to:
Let’s see what this first prompt yielded remembering what I asked it to do. As a caveat, with ChatGPT or similar you should check to make sure that what you wanted it to do, it did, otherwise you will need to revise (and revise) the prompt until you get the results you’re looking for, it is not infallible. If you’re not initially clear on what you want it to do then it’s not going to give you what you really want it to do, if that makes sense. You could probably pull all of this out of any evaluation text, but the beauty of AI is how efficiently it gets the job done.
GPT 1
“I've created a full General-Purpose Tool (GPT) for in-house evaluation tailored to non-profit, not-for-profit, and NGO staff without formal evaluation training. You can now view, edit, and build on it directly in the canvas. Let me know if you want any section expanded—such as adding example evaluation questions, specific survey tools, or walk-throughs of data analysis steps.”
General-Purpose Tool (GPT) for In-House Evaluation in Non-Profit, Not-for-Profit, and NGO Settings
Purpose
This general-purpose tool (GPT) is designed for staff in non-profit, not-for-profit, and non-governmental organizations (NGOs) who want to independently conduct evaluations of their programs, projects, or organizational processes. It assumes no formal training in evaluation and provides a step-by-step, self-contained guide to build internal capacity and ensure responsible, ethical, and rigorous evaluations.
Step 1: Planning and Preparation
1. Allocate Resources
2. Define the Focus
3. Conduct a Literature Review
4. Formulate Evaluation Questions and Hypotheses
Example questions
Step 2: Design the Evaluation
5. Select an Evaluation Design
6. Select Data Collection Methods
7. Identify Ethical Considerations
8. Develop/Adapt Instruments
Types
9. Determine Sampling Strategy
Step 3: Implementation
10. Secure Ethical Approval (if needed)
11. Obtain Informed Consent
12. Administer Data Collection Tools
13. Data Cleaning and Storage
Step 4: Data Analysis
14. Analyze Quantitative Data
15. Analyze Qualitative Data
16. Address Potential Errors/Invalidities
Step 5: Interpretation and Use
17. Interpret Findings
18. Make Recommendations
19. Implement and Learn
Resources & Templates (recommended for adaptation):
Final Note
This GPT is a living document. Organizations are encouraged to adapt it, build in community voice, and revisit evaluation practices regularly to improve learning, accountability, and impact.
So, is this what I really wanted? No, not exactly, it’s just a litany of steps. It’s potentially useful for an organization that wants to begin doing its own evaluations, it’s a point of departure, but it doesn’t actually go far enough in terms of detail and explanation, it’s nothing more than a framework, a guide, it could be found in any evaluation text as noted, and you’d likely need an evaluator or at least someone with a modicum of evaluation training to provide technical assistance. It’s helpful but not in the way I had hoped it would be helpful. What I want is for an organization staff member to say: “evaluate the effectiveness/process/outcomes/impacts of this program/project, etc.,” have said staff person add in some pertinent information, voila, the evaluation is completed in short order. ChatGPT did, however, say “let me know if you want any section expanded—such as adding example evaluation questions, specific survey tools, or walk-throughs of data analysis steps.” Why not? Here goes:
Prompt 2
Please add to the In House Evaluation GPT instructions for how to carry out each step, provide examples for each element of each step, provide specific tools for each step as necessary, and do a walk-through of the steps of a basic quantitative data analysis and the steps of a basic qualitative data analysis.
Let’s see what we got.
GPT 2
Oops, crickets. Let’s try that again with more clarifying instructions. Hope this gets us there, or at least closer…
Prompt 3
The In House Evaluation GPT is good, but an organization needs step-by-step instructions to carry out each step including examples. Please add to the In House Evaluation GPT instructions to carry out steps 1, 2, 3,4,5,6,7,8,9,10,11,12,13,14,15,16,17, 18, and 19. Provide examples for each element of steps 1, 2, 3,4,5,6,7,8,9,10,11,12,13,14,15,16, 17, 18, and 19. Provide specific tools for steps 1, 2, 3,4,5,6,7,8,9,10,11,12,13,14,15,16,17, 18, and 19. Do a walk-through of the steps of a basic quantitative data analysis and a basic qualitative data analysis.
I really hope the third GPT is better than the last. The question: was I clear enough in what I asked ChatGPT to do? Let’s take a look.
GPT 3
“The In-House Evaluation GPT has been expanded with detailed instructions, real-world examples, and specific tools for Steps 1 through 19. It also now includes walk-throughs for basic quantitative and qualitative data analysis. You can view and continue editing it in the canvas. Let me know if you'd like templates, sample instruments, or walk-throughs in downloadable formats.”
General-Purpose Tool (GPT) for In-House Evaluation in Non-Profit, Not-for-Profit, and NGO Settings
Purpose
This general-purpose tool (GPT) is designed for staff in non-profit, not-for-profit, and non-governmental organizations (NGOs) who want to independently conduct evaluations of their programs, projects, or organizational processes. It assumes no formal training in evaluation and provides a step-by-step, self-contained guide to build internal capacity and ensure responsible, ethical, and rigorous evaluations.
Step-by-Step Evaluation Process with Instructions, Examples, and Tools
Step 1: Allocate Resources
Instructions
Example
Tools
Step 2: Define the Focus
Instructions
Example
Tools
Step 3: Conduct a Literature Review
Instructions
Example
Tools
Step 4: Formulate Evaluation Questions and Hypotheses
Instructions
Example
Tools
Step 5: Select an Evaluation Design
Instructions
Decide based on resources and purpose:
Example
Tools
Step 6: Select Data Collection Methods
Instructions
Match method to question:
Example
Tools
Step 7: Identify Ethical Considerations
Instructions
Example
Tools
Step 8: Develop or Find Instruments
Instructions
Example
Tools
Step 9: Determine Sampling Strategy
Instructions
Example
Tools
Step 10: Secure Ethical Approval
Instructions
Example
Tools
Step 11: Obtain Informed Consent
Instructions
Example
Tools
Step 12: Administer Data Collection Tools
Instructions
Example
Tools
Step 13: Store and Clean Data
Instructions
Example
Tools
Step 14: Analyze Quantitative Data
Instructions
Example Walk-Through
Q: Does leadership score differ by attendance?
Tools
Step 15: Analyze Qualitative Data
Instructions
Example Walk-Through
Tools
Step 16: Address Potential Errors/Invalidities
Instructions
Example
Tools
Instructions
Example
Tools
Step 18: Make Recommendations
Instructions
Example
Tools
Step 19: Implement and Learn
Instructions
Example
Tools
Final Note
This GPT is a living document. Organizations are encouraged to adapt it, build in community voice, and revisit evaluation practices regularly to improve learning, accountability, and impact.
So, again, is this what I really wanted? No, not exactly, it’s still a litany of steps with the addition of examples and tools, and like GPT 1 potentially useful for an organization that wants to begin doing its own evaluations with technical assistance from an evaluator or at least someone with a modicum of evaluation training. The point, however, is to eventually get away from evaluators and technical assistants and neither GPT actually gets us there. I put together a partial side-by-side comparison of GPT 1 and GPT 3 so you can judge whether GPT 3 is an improvement:
To me, GPT 3 doesn’t provide enough detail and explanation. While the expanded In House Evaluation GPT is better, an organization will need even more detailed instructions to carry out each step beyond simply being told what to do with a few measly examples and tools. For example, in Step 19 GPT 3 says to share results internally and externally, integrate learnings into planning, and schedule next evaluation. Okay, so how does an organization share results internally and externally, how does an organization integrate learning into planning, and how does an organization schedule the next evaluation? The examples are, at best, vague and need more context and detailed explanation if they are going to be used as guidance. Regarding the tools, for the next GPT a detailed explanation should be provided in terms of how a tools is used, when its used, an example or two of how it’s been used by other organizations, how to find the tools, and instructions for their use It’s helpful but not in the way I envision it being helpful.
And, that concludes Part I. In Part II I am going to attempt one more GPT in this series and then move into creating a GPT for an actual program or project evaluation, basing its construction on actual program data. Wish me luck. Thanks for reading. As always I encourage your thoughts and comments.