How I Built a Multi-Modal AI Chatbot on Amazon EKS Using Amazon Q CLI

How I Built a Multi-Modal AI Chatbot on Amazon EKS Using Amazon Q CLI

From concept to deployment: Leveraging AI-powered assistance to orchestrate LLaMA 3.1, FLUX.1-dev, and Streamlit on Kubernetes


🚀 The Challenge: Building Complex AI Infrastructure

Recently, I embarked on an ambitious project: deploying a multi-modal AI chatbot that combines text generation (LLaMA 3.1 8B) and image generation (FLUX.1-dev) on Amazon EKS (see screenshots below). The complexity was daunting:

  • Multiple AI models requiring different GPU types (L4 vs A100)
  • Complex Kubernetes orchestration with auto-scaling and spot instances
  • Integration challenges between services and frontend
  • Cost optimization while maintaining performance
  • Production-ready deployment with monitoring and health checks

What made this project remarkable wasn't just the technical achievement, but how Amazon Q CLI transformed the entire development experience from a weeks-long struggle into a streamlined, AI-assisted journey.

🤖 Enter Amazon Q CLI: My AI-Powered DevOps Partner

Amazon Q CLI isn't just another command-line tool—it's like having a senior AWS architect and Kubernetes expert sitting right beside you. With AWS MCP Servers, Amazon Q CLI gains superpowers to complete the project.

This is my prompt strategy:

Your role: You are a Solutions Architect who wants to build a multi-modal Chatbot using Modern web frameworks that interacts with text and image models from Hugging Face. If Hugging Face API key is needed, use hf.key which contains the key. You store all final yaml files in sub-folder called 'deployments'. You store all documentation in sub-folder 'documentation'. You make use of necessary MCP servers and tools for these tasks. For all interactions with the EKS cluster, you use EKS MCP server. Plan your tasks so that each step and status is recorded, allowing resuming tasks if there are any interruptions. Do not build anything. Wait for my step by step instructions.        
Step 1: Read AWS documentation to understand to how EKS Auto Mode works, specifically how it manages Node Pools. Then, create a cluster called ai-chatbot with EKS Auto Mode enabled. Create a node pool consisting of different g6 and p5 GPU instances. I will use the GPU node pools for LLM inference in the next steps. The inference engine will be vLLM. Prepare appropriate Storage services for LLM storage and fast access.        
Step 2: Deploy the llama3.1 8B model (see https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on GPU g6.12xlarge instance using vLLM serving engine. The deployment should have 1 pod. Troubleshoot and correct any errors if any. Check the deployment regularly until it is fully deployed. After deployment, test the model. Consider that the cluster has EKS Auto Mode enabled in your deployment strategy.        
Step 3: Deploy the black-forest-labs/FLUX.1-dev model (see https://huggingface.co/black-forest-labs/FLUX.1-dev) on a GPU p5 instance. The deployment should have 1 pod. Troubleshoot and correct any errors if any. Check the deployment regularly until it is fully deployed. After deployment, test the model by generating an image. Consider that the cluster has EKS Auto Mode enabled in your deployment strategy.        
Step 4: Deploy a front-end application (e.g. Streamlit) using general-purpose compute making sure that they have access to both models deployed in the steps above. Use EFS for persistent storage. Store conversation history and make it accessible in the sidebar.        

Here's how Q CLI revolutionized my approach:

1. Intelligent Architecture Planning

Instead of spending hours researching optimal instance types and configurations, I simply asked Q CLI:

"What's the best GPU instance type for running FLUX.1-dev image generation model?"        

Q CLI immediately provided detailed analysis:

  • P4d.24xlarge for A100 GPUs (optimal for FLUX)
  • G6.12xlarge for L4 GPUs (cost-effective for LLaMA)
  • Spot instance strategies with 85% cost savings
  • Memory and compute requirements for each model

Result: Saved 2-3 days of research and testing different configurations.

2. Real-Time Troubleshooting

When my FLUX deployment was stuck in "Pending" status, instead of diving into logs manually, I asked:

"My FLUX pod is pending. Can you help diagnose the issue?"        

Q CLI instantly:

  • Analyzed pod events and node availability
  • Identified GPU node provisioning delays
  • Suggested Karpenter configuration optimizations
  • Provided specific commands to monitor progress

The magic moment: Q CLI detected that my readiness probe had a 10-minute initial delay, explaining why the service appeared "offline" when it was actually loading successfully.

3. Configuration Generation and Optimization

Rather than writing YAML manifests from scratch, Q CLI helped generate production-ready configurations:

"Generate a Kubernetes deployment for LLaMA 3.1 8B with vLLM on G6 instances"        

Q CLI produced:

  • Optimized resource requests and limits
  • Proper GPU scheduling and tolerations
  • Health checks with appropriate timeouts
  • Security best practices built-in

Time saved: What would have taken hours of documentation reading and trial-and-error became minutes of AI-assisted configuration.

🔧 Real-World Problem Solving with Q CLI

Challenge 1: LoadBalancer Connectivity Issues

Problem: Users couldn't access the Streamlit frontend via LoadBalancer URL.

Q CLI Assistance:

"The LoadBalancer URL is not accessible. Can you help troubleshoot?"        

Q CLI systematically:

  1. Checked LoadBalancer configuration and security groups
  2. Identified DNS propagation issues
  3. Provided alternative access methods (port forwarding, NodePort)
  4. Generated troubleshooting commands for validation

Outcome: Multiple access paths established, ensuring users could always reach the application.

Challenge 2: GPU Resource Allocation

Problem: FLUX model wasn't utilizing A100 GPU efficiently.

Q CLI Guidance:

"How can I optimize FLUX.1-dev performance on A100 GPUs?"        

Q CLI recommended:

  • Specific CUDA memory allocation configurations
  • PyTorch optimization flags
  • Batch size and inference step tuning
  • Memory management best practices

Result: 40% improvement in image generation speed and GPU utilization.

Challenge 3: Service Integration

Problem: Streamlit frontend couldn't communicate with AI model services.

Q CLI Solution:

"My Streamlit app can't reach the AI services. Help me debug the connectivity."        

Q CLI provided:

  • Service discovery troubleshooting commands
  • Network policy validation
  • DNS resolution testing
  • Step-by-step connectivity verification

Impact: Seamless service integration achieved in minutes instead of hours.

📊 The Results: A Production-Ready Multi-Modal AI System

Thanks to Q CLI's assistance, I successfully deployed:

🏗️ Architecture Achieved

  • Frontend: Streamlit web interface on cost-optimized compute
  • Text AI: LLaMA 3.1 8B on G6.12xlarge (L4 GPUs)
  • Image AI: FLUX.1-dev on P4d.24xlarge (A100 GPUs)
  • Infrastructure: Auto-scaling EKS with Karpenter and spot instances

📈 Performance Metrics

  • Text Generation: 2-5 seconds response time
  • Image Generation: 20-30 seconds for 1024x1024 images
  • Cost Optimization: 85% savings with spot instances
  • Availability: Multi-AZ deployment with health monitoring

🎯 User Experience

  • Multi-Modal Chat: Text, image, or combined responses
  • Real-Time Monitoring: Live service health status
  • Configurable Settings: Adjustable generation parameters
  • Multiple Access Methods: LoadBalancer, NodePort, port forwarding

🛠️ Q CLIs Game-Changing Capabilities

1. Contextual Problem Solving

Q CLI doesn't just provide generic answers—it understands your specific environment and provides targeted solutions.

2. Multi-Service Orchestration

Managing complex deployments across multiple services becomes manageable with Q CLI's holistic approach.

3. Best Practices Integration

Every suggestion incorporates AWS Well-Architected principles and Kubernetes best practices.

4. Real-Time Assistance

No more context switching between documentation, forums, and terminals—Q CLI provides immediate, relevant help.

5. Learning Accelerator

Q CLI doesn't just solve problems; it explains the reasoning, helping you learn and improve.

📚 The Documentation Revolution

One of Q CLI's most impressive features was helping me create comprehensive project documentation:

"Generate comprehensive documentation for this multi-modal AI project"        

Q CLI produced:

  • 17,000+ words of detailed documentation
  • Step-by-step deployment guides
  • Configuration references
  • Troubleshooting runbooks
  • Sample prompts and testing guides

The result: A complete package that anyone can use to redeploy the entire system.

🎯 Key Takeaways for Fellow Engineers

1. AI-Assisted Development is Here

Amazon Q CLI represents a fundamental shift in how we approach complex infrastructure projects. It's not replacing our expertise—it's amplifying it.

2. Time-to-Value Acceleration

What traditionally takes weeks of research, trial-and-error, and debugging can now be accomplished in days with AI assistance.

3. Quality and Best Practices

Q CLI doesn't just help you build faster—it helps you build better, with security, scalability, and cost optimization built-in.

4. Learning and Growth

The explanatory nature of Q CLI's responses makes it an excellent learning tool, helping you understand the "why" behind recommendations.

5. Documentation Excellence

AI-assisted documentation creation ensures comprehensive, accurate, and maintainable project artifacts.

🚀 The Future of Infrastructure Development

This project demonstrated that we're entering a new era where:

  • Complex deployments become accessible to more engineers
  • Best practices are automatically incorporated
  • Troubleshooting becomes collaborative with AI
  • Documentation is comprehensive and current
  • Learning curves are dramatically reduced

💡 Try It Yourself

If you're working on AWS infrastructure projects, I highly recommend exploring Amazon Q CLI. Whether you're:

  • Deploying AI workloads on EKS
  • Optimizing costs with spot instances
  • Troubleshooting complex service interactions
  • Creating documentation for your projects
  • Learning new AWS services and patterns

Amazon Q CLI can transform your development experience.

🔗 Project Artifacts

The complete multi-modal AI chatbot project, including all configurations, documentation, and deployment guides, is now available as a production-ready package. The system demonstrates:

  • Multi-modal AI capabilities (text + image generation)
  • Production-grade architecture with auto-scaling and monitoring
  • Cost optimization strategies with spot instances
  • Comprehensive documentation for redeployment

🎉 Final Thoughts

Building this multi-modal AI chatbot was an incredible journey, made possible by the power of Amazon Q CLI. It's not just about the technology we deployed—it's about how AI-assisted development is changing the game for all of us.

The future of infrastructure development is collaborative, intelligent, and incredibly exciting. Amazon Q CLI is leading that transformation.


What's your experience with AI-assisted development? Have you tried Amazon Q CLI for your AWS projects? I'd love to hear about your experiences and challenges in the comments!

#AWS #AmazonQ #EKS #AI #MachineLearning #Kubernetes #DevOps #CloudComputing #ArtificialIntelligence #Innovation


🔗 Connect with me to discuss AI infrastructure, cloud architecture, and the future of development tools. Always happy to share experiences and learn from the community!


Screenshots:


Article content
Amazon Q CLI


Article content
Amazon EKS Resources in K9S


Article content
Chatbot


Article content
Article content
Documentation


Article content


Article content


Article content
OpenWebUI Example (in another cluster)


Peter Perbellini

Head of APJ Solutions Architecture, Enterprise Apps, Migrations and Modernization (EAMM)

1mo

Love this...

Khan Asif Azad , CISSP

Senior Solutions Consultant @ VMware | Cisco Business Architecture Analyst | Multi-Cloud Architect | Baker

1mo

Nice one Hantzley Tauckoor ! Keep it up

Sanjay Bhatnagar

Helping Indian SME Manufacturers Unlock Hidden Revenue | Business Model Innovation Expert | IIT Delhi Alumnus.

1mo

Thanks for sharing, Hantzley

Donghyuk Shin

GenAI + Digital Engagement | 0→1 to 10x Business Leader Scaling APJC | GTM & Product-Led Growth at AWS

1mo

Really well built, Hantzley Tauckoor. Insightful and engaging from start to finish.

To view or add a comment, sign in

Others also viewed

Explore topics