From Prototype to Product – Building GenAI Learning Tools That Scale

Matías Rodríguez Paz

Sr. Product Manager, Gen Ai Learning Products @ Amazon | Product Management, AI, LX Design, Musician

Published Apr 11, 2025

"It's not about what AI can do—it's about what people can do with it." – Unknown

In my 15+ year journey working at the intersection of product management, learning experience (LX) design, and training delivery, one thing has remained constant: a desire to bridge the gap between learning and doing. Whether designing VR training for hazardous environments or building tools that help people grow on the job, I've always been driven by the same question—how can we make learning more meaningful, accessible, and human?

That question took on a new dimension in late 2022, when I started exploring how large language models (LLMs) could reshape the way people learn at work. What began as a side-of-the-desk project quickly became something much more.

Learning, Always Learning

My background in LX design gave me a deep appreciation for behavior change, contextual learning, and cognitive load. I'd built learning products across different industries and formats—from Learning Management System (LMS) platforms and classic e-learning courses to mobile-first gamified learning apps. These experiences grounded me in the principles that would later shape my approach to working with AI: empathy, clarity, and structure.

As these projects grew in complexity, I found myself searching for ways to better align learning design with product strategy—tools that could help me prioritize user needs, measure outcomes, and scale solutions effectively. That search led me to pursue formal training as a Digital Product Manager in 2015, where I discovered how to apply product thinking and iterative design methods to learning challenges. This allowed me to connect the dots between user needs, business goals, and technical feasibility—a combination that would prove critical in scaling AI-powered learning solutions.

GenAI: More Than Just a Shiny Toy

Like many, I was fascinated by the release of ChatGPT (3.5 Turbo) in late 2022. But I wasn't interested in using AI for novelty or as a buzzword. I saw a clear, unmet need: making role-based learning more effective at scale.

Traditional training has always come with trade-offs. Instructor-led sessions give learners depth and human connection, but they're hard to scale—and hands-on practice often takes a backseat. On the other end, self-paced eLearning scales well but often lacks relevance, context, and personalization. There's a wide gap between those two extremes, and GenAI felt like a promising way to bridge it. I wanted to explore whether it could offer learners timely, tailored coaching in realistic scenarios—something that felt both scalable and human.

Think about a customer service rep learning to handle difficult conversations. Or a healthcare provider practicing how to deliver sensitive news. Or a manager preparing for a tough performance review.

Instead of watching a video, taking eLearning or role-playing once in a classroom, what if they could practice dozens of variations with an AI? What if they got real-time coaching on empathy, communication, and decision-making—before facing the real-world scenario?

This isn't unlike what Duolingo has done with their AI conversation practice or what Talespin has created for leadership development—using AI characters to help people practice difficult conversations in a safe space. But I was interested in building something that could work across multiple roles and skill levels—and scale across dozens of countries, languages, and business challenges.

Late Nights and Early Prototypes

So I built a prototype.

In my spare time—late nights and weekends—I designed a scenario-based training tool powered by an LLM and supported by a custom prompt engineering framework. The early design was simple: create realistic scenarios where learners could practice critical conversations, then use AI to provide coaching that was personalized and actionable. I tested it with a small group of colleagues who volunteered their time to try early versions and provide feedback.

Around the same time, I noticed that others across the product and learning multiverse were diving into similar explorations—experimenting with GenAI to rethink how learning could work. It started to feel like a space race, with teams testing ideas, sharing wins, and learning fast. The early feedback on my prototype was encouraging—and eventually, it caught the attention of leadership.

From Side Project to Pitch

What followed was a whirlwind. I pitched the concept and demoed the tool to senior leaders and and later had the opportunity to present it to our Amazon Stores CEO, Doug Herrington. From there, I worked closely with technical partners and external experts to explore scalability, security, and long-term integration.

The product evolved from a scrappy prototype into something that could shape learning experiences for hundreds of thousands of people. It didn't happen overnight. It took months of hands-on iteration, technical validation, and close collaboration across disciplines.

Our team expanded to include engineers, designers, program managers and learning specialists. Each brought unique perspectives that transformed what had started as a solo project into something much more robust. Weekly design sessions became a highlight—watching engineers find elegant solutions to complex problems, while we crafted interfaces that made advanced learning technology feel approachable.

Wrestling with AI's Challenges

The journey from prototype to product revealed nuanced technical challenges that aren't often discussed in mainstream AI conversations. Hallucinations—where LLMs confidently generate plausible but incorrect information—became a significant hurdle in learning contexts where accuracy and relevance are non-negotiable.

Discarding the idea of training a model—which would have been costly and time-intensive—I focused instead on building a robust prompt engineering framework that could support reliable AI role-play and coaching at scale. By creating structured, multi-layered prompts that define clear evaluation criteria, learning contexts, and coaching parameters, we dramatically improved consistency. The key was finding the right balance between constraint and flexibility—too rigid, and the AI feedback feels mechanical; too loose, and learners receive unpredictable guidance.

Early versions of our scenarios sometimes produced wildly inconsistent feedback—one learner might receive glowing praise while another, with a nearly identical response, would get critical feedback. Left unchecked, these inconsistencies could undermine trust in the system and highlighted the need for a more structured evaluation framework.

Through iterative testing, we developed patterns that ensured fair, unbiased and consistent assessment while preserving the natural, conversational tone that makes AI coaching effective. What surprised me most was how these guardrails actually enhanced, rather than limited, the learning experience—by ensuring consistency and accuracy, learners could focus on skill development instead of questioning the validity of the feedback.

Looking ahead, we're exploring even more sophisticated approaches, like retrieval-augmented generation (RAG) to ground responses in domain-specific content, and intelligent workflow systems that could transform how learning adapts to individual needs.

New skill unlocked: Prompt Engineering

My perspective on prompt engineering evolved dramatically throughout this journey. What began as simple instructions to an AI model transformed into a more sophisticated design discipline that balances precision, flexibility, and behavioral insights.

I learned that effective prompts for learning applications aren't just about getting the "right answer"—they're about creating the right conditions for skill development. I drew inspiration from a range of sources: prompt libraries and research from OpenAI, Anthropic, and Cohere, as well as learning design frameworks like action mapping to develop my own approach to it. The most successful patterns emerged from understanding models of guided learning—providing scaffolding that gradually fades as learner competence grows.

Each iteration of our prompting approach became more nuanced, incorporating contextual awareness, learning objectives, and behavioral science principles while maintaining the authenticity needed for meaningful practice. And staying on top of best practices—especially in a field evolving week to week—became part of the job.

Balancing AI and Human Elements

Perhaps the most profound insight came from understanding the appropriate balance between automation and human touch. Unlike tools designed to replace human interaction, I found that AI learning experiences work best when they amplify rather than substitute human capabilities.

In practice, this meant designing systems where AI handles the heavy lifting of scenario variation, personalized feedback, and adaptive complexity—freeing human coaches to focus on nuanced emotional intelligence, complex edge cases, and relationship building. This complementary approach yielded outcomes that neither humans nor AI could achieve independently, especially in communication-intensive domains where both technical accuracy and emotional resonance matter.

I've seen similar philosophies at work in platforms like SkillGym or Rehearsable.ai, where AI provides initial feedback on scenario-based learning but escalates to human coaches for more complex situations. It's not about replacing human guidance—it's about making it more scalable and accessible.

Measuring What Matters

Measuring the effectiveness of AI-powered learning experiences came with its own set of challenges. Traditional completion rates or satisfaction surveys weren’t enough—they couldn’t capture how well learners applied their skills on the job or how those skills influenced broader business outcomes.

Understanding that impact required going deeper.

We looked at how this new type of learning shaped behavior, built confidence, and influenced team performance—not just whether someone finished a module. We ran A/B tests, gathered continuous feedback, and iterated on the experience. Just as importantly, we introduced methods to evaluate and audit AI-generated feedback with human reviewers in the loop to ensure quality, fairness, and accuracy.

What emerged was a more holistic view of learning effectiveness—one that prioritized real-world behavior change, skill development, and adaptability in varied contexts over checkboxes and completion stats. This approach not only gave us richer insights, but also helped build trust in the product we were creating and show measurable ROI.

A Few Things That Clicked Along the Way

Throughout this journey, a few principles kept surfacing—reminders that helped shape both the product and the process.

Proximity to the problem matters. The closer I stayed to learners and their challenges, the better the product became.
Being hands-on builds credibility. Prototyping, testing, and refining ideas directly gave me the insight I needed to communicate the product’s value clearly.
Partnerships unlock scale. Working with key partners, such as AWS, Slalom and Anthropic, gave us the infrastructure and expertise to move from concept to deployment securely and efficiently.
AI should make learning feel more human. Our goal was never to automate people out of the process—it was to create better coaching, better scenarios, and better confidence.
Team diversity creates better products. The engineers, designers, localization experts, and subject matter experts who joined the journey each brought perspectives that transformed the product in ways I couldn't have imagined alone.

What’s Next: Smarter Experiences, Not Just Smarter Models

I believe we’re just scratching the surface of what GenAI can do for workplace learning. The future lies not just in smarter models, but in smarter experiences—ones that are more adaptive, human-centered, and grounded in real-life challenges.

I’m inspired to share as much of this journey as I can—while continuing to learn from others exploring the same space. There’s still so much to figure out—across models, languages, and use cases—and I’m excited to keep building and improving alongside the community.

A special thank you to Anna Liashenko and Levente Vero, who joined me early in this journey, to Slalom, AWS+ and to my leadership team for their continued support and trust.

If you’re working on similar challenges, or just curious about what’s possible—I’d love to connect.

Lorena Paz

Especialista en usos sociales de las TIC (ICT4D) Consultora Internacional. Investigadora y Diseñadora de Servicios Socio-Técnicos. UXR Senior #Diseño Inclusivo #Economía de los cuidados #AgeTech #Greentech

3mo

total pride of my nephew 💓

Ramanie P.

Global Training Services Manager - MBA,C.Chem;MRSC;M.Sc - UK,

4mo

Great to hear about your success at Amazon! You were always in the forefront when it came to product development when you worked with us. Happy to see you grow in your professional journey. Wish you all the best!

Laura Martínez Quijano

Data-driven assessment丨AI-driven training丨Audit丨ISO 50001 and ISO 46001 Achieve savings without CapEx, using data and strategy

4mo

Muy interesantes el camino y el desarrollo, Matías Rodríguez Paz ! Por mi parte, he iniciado un camino semejante desarrollando un asistente virtual que puede ofrecer situaciones reales para entrenar a auditores de sistemas de gestión más allá de la teoría o de algunos casos prácticos, y con el apoyo de la propia documentación de la organización. En mi experiencia, a los auditores en formación siempre les falta práctica para poder realizar auditorías eficaces y con valor agregado, para poder identificar verdaderas oportunidades de mejora! La instancia de auditoría es siempre estresante para las organizaciones, ya que requiere movilizar muchos recursos y destinar el siempre escaso tiempo a una actividad en la que nadie está muy cómodo. Con auditores internos (o externos!) entrenados con un asistente virtual, no solamente el entrenamiento es personalizado sino que también se acortan los tiempos de entrenamiento y se logran mayores competencias más sólidas. Felicitaciones por los logros Matías Rodríguez Paz 👏👏 El producto que estás desarrollando ya está en el mercado? Qué aplicaciones tiene?

Estefanía Agnese

Especialista en Comunicaciones Internas y Clima en Universal Assistance S.A.

4mo

Qué lindo leerte Mati, ¡gracias por compartir!

See more comments

To view or add a comment, sign in

From Prototype to Product – Building GenAI Learning Tools That Scale

Matías Rodríguez Paz

Sr. Product Manager, Gen Ai Learning Products @ Amazon | Product Management, AI, LX Design, Musician

Learning, Always Learning

GenAI: More Than Just a Shiny Toy

Late Nights and Early Prototypes

From Side Project to Pitch

Wrestling with AI's Challenges

New skill unlocked: Prompt Engineering

Balancing AI and Human Elements

Measuring What Matters

A Few Things That Clicked Along the Way

What’s Next: Smarter Experiences, Not Just Smarter Models

Others also viewed

Collaborating With AI In L&D: Enhancing Human Expertise In Learning Programs

Enhancing Digital Learning with AI-Powered Feedback Loops

Rethinking Learning for the AI Age: How People and AI Can Learn and Solve Problems Together

AI Took Our Tasks. Now It’s Giving Us a New Role: The Rise of the Learning Strategist

Harnessing Adaptive Learning Technologies: Revolutionizing Training with Personalized Learning Experiences

Scaling Learning Programs in the Age of AI: The Unprecedented Potential

Creating Personalized Learning Journeys with AI

Adapting to Learner Attributes

5 Steps to Building A Chatbot Learning Experience

AI Won’t Replace L&D. But It Will Redefine It.

Explore topics