Evaluating the Ambiguous – Measuring Hypervelocity Engineering Success
Following my posts on Becoming an AI Engineering Team, What is Hypervelocity Engineering, and Start Slow and Accelerate, I want to tackle one of the most challenging aspects of adoption: how do you know if it's actually working? Hyper Velocity Engineering (HVE) is about leveraging AI and reusable patterns to accelerate the journey from raw idea to production software, freeing teams to focus on meaningful, creative work. Unlike traditional team metrics where success is often discipline-specific, measuring the effectiveness of human-AI collaboration across Design, Engineering, Project Management, Data Science, and Security requires us to think more holistically about value creation.
This is perhaps the most speculative post in the series so far. We're all still figuring out what "good" looks like when multi-disciplinary teams collaborate with AI, and the measurement approaches I'm sharing here are experiments in progress, not proven methodologies. But without some framework for evaluation, teams risk either abandoning promising approaches too early or persisting with practices that aren't delivering value.
What measurement challenge is your team facing as you integrate AI into your workflows? I'd love to hear your thoughts as we explore this together.
Learning from Established Frameworks
GitHub's Engineering System Success Playbook offers valuable themes we can adapt for HVE contexts: focusing on developer experience, measuring both velocity and quality of outcomes, and emphasizing continuous feedback loops. While their framework wasn't designed with AI collaboration in mind, the core principles of measuring what matters to your team's daily experience translate well across disciplines in our hybrid human-AI workflows.
The key insight from their approach is that successful measurement combines quantitative metrics with qualitative feedback, recognizing that the most important outcomes often can't be captured by numbers alone – whether you're talking about code quality, design effectiveness, or business value.
Setting the Stage
Before we get into the metrics itself, I want to set the stage by talking about the broader context – ways metrics can go wrong, how to solve for that (using AI to help), and how to think about the act of measuring in the more holistic context of growing the AI muscles of your team.
Watch for Perverse Incentives and Anti-Patterns
Perhaps more important than tracking positive metrics is watching for the ways measurement can go wrong in HVE environments across different disciplines.
Teams can begin Gaming the Metrics. Project managers could start inflating velocity estimates to show AI impact. Designers may only be using AI for simple asset generation to boost success rates of AI-generated designs, vs. going for the design they believe is best. SDEs may not be factoring in prompt-engineering time when evaluating time saved through AI tools.
Keep watch over time, as a simple metric like “time to PR approval” can work well at first and suddenly drive undesirable behavior if your teams believe they are being judged on the value (think of KLOC and some of the code-bloat that metric generated).
You can be vulnerable to Measurement Theater: Spending more time measuring AI effectiveness than actually improving cross-functional workflows, or creating elaborate dashboards that no one acts upon while ignoring qualitative feedback from team members.
Your team may suffer from Professional Insecurity Responses. This shows up differently across disciplines, here are some to watch out for. Designers might over-emphasize AI-generated concepts to appear cutting-edge. Data scientists might under-report AI assistance to protect their analytical expertise. Security experts might avoid AI collaboration to maintain their role as the "human firewall."
Keep watch over time, as a simple metric like 'time to PR approval' can work well at first and suddenly drive undesirable behavior if your teams believe they are being judged on the value
Siloed Evaluation, or measuring AI impact within individual disciplines without considering cross-functional effects, can miss key improvements in the overall process. For instance, AI might slow down initial Data Science work, but the improvement in AI-coauthored DS code and tests may make DS to SDE handovers much smoother.
The most dangerous anti-pattern is measuring AI performance in isolation rather than evaluating how human-AI collaboration affects the entire product development lifecycle.
How have you seen measurement go wrong in your team's AI adoption journey? What warning signs should other teams watch for?
Course-Correcting When Things Go Sideways
The beauty of treating HVE as an ongoing experiment is that course correction becomes part of the process, not a sign of failure. Here are some tactics for when metrics suggest things aren't working across different functions.
Involve AI in cross-functional problem-solving: Ask your AI tools to analyze patterns across different discipline feedback and suggest alternative approaches. Sometimes AI can spot solutions that span functional boundaries in ways individual team members might miss. For instance we spotted in some survey results that DS were less confident using AI tools than SEs – digging in, we realized we needed to make more time for DS to benefit from experience sharing so that they realized how quickly the competence of these models were increasing, but we also needed to involve DS more heavily in designing our scoring rubrics as some evaluations were generating false confidence.
Revisit your measurement framework: Poor metrics might indicate you're measuring the wrong things, not that HVE isn't working. Be especially cautious about metrics that seem to improve in one area while degrading in another as this often signals measurement misalignment rather than actual problems.
Scale back strategically: Rather than abandoning AI assistance entirely, be intentional in your experimentation. Learn what is working for you and your team, and what isn’t, and quickly pivot away from areas that aren’t providing value. Document why you abandon approaches – this field is changing rapidly, and approaches you abandon now as infeasible might be possible within a few months.
Cross-pollinate learnings: Security teams might discover prompting techniques that help design teams, or project management workflows might inform data science evaluation approaches. Make sure insights can flow across disciplinary boundaries. We use cross-discipline Teams meetings to share emerging insights and best practices, document our experiments using Markdown-based experiment templates with synopses in Loop for easy discoverability, and Viva Pulse surveys to track qualitative metric feedback over time. One important point we make with all of our teams is that there is no one right way at this point – things are changing fast enough that we’re all learning from each other, no matter the discipline or level. We’re evolving in our methods, though, so I would love to hear what your teams are using!
There is no “one right way” to use AI in your engineering engagements, we’re all learning from each other across disciplines and levels, and the tools are exponentially increasing in capability
Set Realistic Expectations
AI won't solve all your team's challenges overnight. Pretending otherwise sets up everyone for disappointment. Creating measurement rubrics and cadences gives your teams the confidence to experiment with new techniques while providing stakeholders with concrete evidence of where AI delivers value, and where it currently falls short. This transparency builds trust by demonstrating you're approaching AI adoption thoughtfully, not "vibe coding" your way toward production issues.
The Meta-Challenge of Holistic Measurement
Perhaps the most challenging aspect of measuring HVE success is that we're trying to evaluate a moving target using tools that are themselves rapidly evolving, while balancing the needs and perspectives of multiple disciplines that historically measured success very differently.
The teams that will succeed long-term are those building the evaluation muscles that help them adapt
This is why I believe the most valuable measurement focuses on cross-functional team capabilities and processes rather than specific AI tool performance within individual disciplines. Are you getting better at identifying good use cases for AI assistance across different functions? Is your team developing stronger skills in human-AI collaboration that span traditional role boundaries? Are you building institutional knowledge that will transfer as tools evolve and as the lines between disciplines continue to blur?
The teams that will succeed long-term aren't necessarily those with the best metrics today within any single discipline, but those building the evaluation muscles that help them adapt as both AI capabilities and cross-functional collaboration patterns continue shifting.
Cross-Disciplinary Metrics That Matter
Based on the bottleneck identification I discussed in Start Slow and Accelerate, below are concrete metrics teams are considering or experimenting with across different disciplines: For those that seem subjective or ambiguous, we’re using Likert scores or other simple judging rubrics that allow us to gather quantitative data on different facets and move us away from “judging on vibes”, and will be adapting them over time. Having different teams using different (but potentially overlapping) criteria can be a good way to broaden your experimentation and coalesce more quickly on a set of criteria that work for you.
Broken out by discipline, here are some examples of metrics that matter our teams are considering or experimenting with. I would love to hear what you and your teams are using.
Design and User Experience:
Project Management and Stakeholder Communication:
The goal isn't to prove AI is "better" - it's to understand where human-AI collaboration creates genuine value across your entire team's workflow.
Data Science and Analytics:
Security and Compliance:
Designing Effective Qualitative Measurement
One of the most practical approaches I've seen teams adopt borrows from data science evaluation practices, but simplified for cross-functional contexts. The key is creating systematic approaches to capture and interpret qualitative feedback that would otherwise be lost in the noise of daily work.
End-User Satisfaction Surveys: Beyond Basic Ratings
Rather than simple "How satisfied are you?" surveys, effective HVE measurement requires more nuanced questioning. Consider structuring surveys around specific collaboration scenarios:
For stakeholders receiving AI-enhanced project communications:
For team members using AI-assisted workflows:
Systematic Qualitative Data Capture
Establish regular "qualitative checkpoints" beyond traditional retrospectives, and use those to refine and improve your HVE workflow:
The key is creating structure around qualitative feedback while keeping the cognitive load manageable. Use consistent language and scales, but allow for open-ended responses that can reveal unexpected insights.
Analyzing and Acting on Qualitative Data
Qualitative feedback can be tough to turn into actionable insights, you can do so through:
Tracking Business Value and ROI
Over the medium to long term, you’ll want to be tracking business value and ROI in a quantitative way. GitHub’s ESSP comes in handy again here, as they outline some concrete business value metrics in their playbook. One that I find particularly valuable in the context of HVE is Feature Engineering Expenses vs. Total Engineering Expenses. The goal of HVE at least in the beginning, as I’ve laid out in my other posts, is to help reduce and remove bottlenecks and to ease the burden of more rote engineering tasks, leaving impactful and interesting features to be co-designed and co-developed by your team and AI in tandem. Understanding how much of your efforts are going into feature development, and how that’s changing as AI is adopted on your teams, can be a good indicator of ROI for those AI investments. However, business value metrics are wide and varied – I would love to hear, what are you using for tracking ROI in this new AI-enabled landscape?
Your Turn: Share Your Measurement Journey
I'm particularly curious about your experiences with measuring AI adoption across different disciplines. Some questions for discussion:
The measurement landscape for HVE is still being written, and your experiences could help other teams avoid common pitfalls while identifying promising approaches. Share your thoughts, challenges, and successes in the comments and let's build this knowledge base together.
#HypervelocityEngineering #AIEngineering #TechLeadership #SoftwareDevelopment #EngineeringExcellence #AIProductivity #TechInnovation #EngineeringTeams #TechMetrics #CrossFunctionalTeams
Inqqa AI connects the dots in employee surveys & market research
1moIt’s honestly tough to get this right, isn’t it? I think getting real, unfiltered insights from surveys about AI adoption can be a real challenge, especially if people only give surface-level answers. Do you have an approach that helps you dig deeper into open-ended feedback from teams without it turning into a massive data wrangling project?