Data collection methodologies for measuring AI impact
Measuring the impact of AI coding tools requires a multi-dimensional approach that captures both quantitative metrics and qualitative insights. The AI Measurement Framework provides clear guidance on what to measure -- across the dimensions of utilisation, impact, and cost -- but you still need to figure out how to gather that data.
Below are three complementary data collection methods that, together, offer a well-rounded approach to measuring the impact of AI tooling.
1. Tool-Based Metrics
Most AI tools have admin APIs that allow you to track usage, spending data, and token usage, or code suggestions and acceptances. System-level metrics from other tools in your development process like GitHub, JIRA, Linear, your CI/CD tools and build systems, and incident management systems will help you see other shifts that result from AI adoption, like changes in pull request throughput or review latency.
2. Periodic Surveys
Periodic surveys -- typically quarterly -- are effective for capturing longer-term trends. Over time, surveys help track whether the developer experience is improving or degrading as AI usage scales. Surveys are also very useful to measure data across multiple systems, where tool-based metrics may be unwieldy or impractical. Developer satisfaction and other perceptual measures (such as change confidence, or perceived maintainability of source code) can't be captured from system data, so surveys are the right choice.
3. Experience Sampling
Experience sampling involves asking a single, highly targeted question at the point of work. For instance, after submitting or reviewing a pull request, you might ask, “Did you use AI to write this code?” or “Was this code easier or harder to understand because it was AI-generated?” These responses yield granular, in-the-moment feedback that’s difficult to capture through periodic surveys or telemetry alone.
There's usually more than one way to get to a data point. For example, you can look at GitHub PR review data coming from the system to measure PR throughput. You can also ask the question "In the past month, how frequently have you merged new changes that you were the author of?"
By layering these three methods -- tool-based metrics, periodic surveys, and experience sampling -- you can build a measurement approach that’s both comprehensive and resilient (DX gives you these out of the box). Each method has its strengths, and together they allow you to cross-validate data and stay grounded in what’s actually happening in your org. This data helps you build a feedback loop that helps you make smarter decisions about your AI strategy as AI becomes a deeper part of your development processes.
no bullsh*t security for developers // partnering with universities to bring hands-on secure coding to students through Aikido for Students
1moI really liked how you broke this down into three distinct approaches, embedded telemetry, surveys/interviews, and task-based experiments, and especially how you highlighted the trade-offs between accuracy, bias, and operational overhead. In my experience, the biggest trap teams fall into is jumping straight to metrics without first asking: Do we actually have a clean baseline? When I was working on measuring the impact of AI-assisted workflows, we had some dashboards telling us “time saved,” but when we compared it to our pre-AI process, we realized the baseline was skewed because the old workflow data wasn’t consistently tracked. One thing that worked well for us was mixing passive telemetry with short, targeted surveys right after a workflow completed. Telemetry gave us objective usage patterns, but the surveys gave us the “why” behind the behavior. That’s where we discovered some features were technically faster but felt slower to users because of mental load. When you’re advising teams, how do you help them pick their first collection method, the one that’s good enough to start without creating so much overhead that they stall out?
Reimagining Developer Experience for Enterprise Software #rigormeetsrelevance
1moFYI Paige Perusset and Philipp Hoffmann
Director of Quality Engineering | 15+ yrs QA | AI-driven test strategies, scalable automation & modern QE to ship faster without losing quality
2moThis is such a clear and practical framework. It's great how it combines system telemetry with human insights. The layering of tool-based metrics, surveys, and experience sampling really resonates; it mirrors how we’ve had to evolve our own measurement strategy as AI tooling scales. Bookmarking this for our next AI impact review.