It's time for the third article in my six part series on Operationalising #AgenticAI, and this is one that we are excited to share as its another of the least developed areas of Agentic, yet one that is ultimately critical for successful and safe adoption at scale. Special thanks to Futuria's Luther Power for his contributions to this article, in addition to guest collaboration with Chris Jefferson, co-founder and CTO at Advai. Whilst Chris and I are relatively new to each other, we found a common passion and language for this topic across our teams immersed in this Agentic world. This is most certainly not an article to read and think that you need to do all of this tomorrow. Indeed, if anyone told me that they had an approach to benchmark multi-agent team quality, then I wouldn't believe them. It's most definitely in the emergent space. However, these are some of the things that we are thinking about each week in the work that we are doing, both in the US and the UK. Check in next week for Pillar 4, which will look at an area that I know a number of people are fascinated by - the philosophy of High Performing Hybrid Teams.
I love the idea that agents could detect when human collaborators are struggling - spotting signals like confusion, hesitation, or missing input. Rather than just flagging issues, they could actively help people improve their knowledge over time, prompting clarification, filling gaps, or surfacing relevant context. That shifts the dynamic from agents simply supporting tasks to improving the resilience and performance of the whole organisation.
An important and overlooked facet of trust is confidence. To trust the system you have to be confident it will perform as expected. With #agenticai and #generativeai we are providing systems that are less procedural and less deterministic so the level of testing to give confidence needs to be way beyond what we have done before and beyond what is practically achievable with humans. Automated testing with AI is now critical but having humans validate the automated test frameworks to ensure that we are testing and verifying results correctly. Agentic is about having the best teams of humans and #AI agents working together, both doing what they do best.
Great article, thanks for sharing! Defining bespoke performance benchmarks for agents and teams is absolutely critical as we move beyond traditional LLM evaluation metrics into real-world agent team deployment.
You can find the Pillar 1 article here https://www.linkedin.com/pulse/pillar-1-safety-trust-rob-price-yjw8e/?trackingId=Vti%2BulzCSuKJ0nsVxMpASw%3D%3D
Really interesting analysis. I'm looking forward to reading your take on Hybrid teams.
#DigitalResponsibility #AgenticAI Founder | Leading Futuria, CDR, DRF | Thought Leader | Advisor & Ambassador to AI companies | Podcast Host | Former Managing Partner, CDO, COO | #runner #succulents #metal #MTLFC
1moAnd sharing for those with a common shared passion for #DigitalResponsibility in technology innovation Dr. Saskia Dörr Karen Elliott Iliana Grosse-Buening Dr Julia Stamm FRSA Deborah Hagar Jessica Butcher MBE Jessica Huntingford Karin Dietl Michael Wade Anna Zongollowicz, PHD and those who have joined me on the #Futurise podcast Hannah Foxwell AI for the rest of us Cien S. Simon Torrance Steve Smith Patrick Debois Andreas Welsch Nicolas Babin Shikoh Gitau Eric Broda Soo Ling Lim Tim Flagg UKAI Matthew Skelton Franck Pivert Suzanne EL-Moursi Prof. Joe O'Mahoney Louise Ballard (Moody) Mackenzie M. Howe