How I Built a Multi-Modal AI Chatbot on Amazon EKS Using Amazon Q CLI

Hantzley Tauckoor

Technology & GTM Strategy | Business Development | Leadership | Architecture | Cloud | GenAI | AppMod | CCDE | CISSP | Vibe Coder

Published Jul 14, 2025

From concept to deployment: Leveraging AI-powered assistance to orchestrate LLaMA 3.1, FLUX.1-dev, and Streamlit on Kubernetes

🚀 The Challenge: Building Complex AI Infrastructure

Recently, I embarked on an ambitious project: deploying a multi-modal AI chatbot that combines text generation (LLaMA 3.1 8B) and image generation (FLUX.1-dev) on Amazon EKS (see screenshots below). The complexity was daunting:

Multiple AI models requiring different GPU types (L4 vs A100)
Complex Kubernetes orchestration with auto-scaling and spot instances
Integration challenges between services and frontend
Cost optimization while maintaining performance
Production-ready deployment with monitoring and health checks

What made this project remarkable wasn't just the technical achievement, but how Amazon Q CLI transformed the entire development experience from a weeks-long struggle into a streamlined, AI-assisted journey.

🤖 Enter Amazon Q CLI: My AI-Powered DevOps Partner

Amazon Q CLI isn't just another command-line tool—it's like having a senior AWS architect and Kubernetes expert sitting right beside you. With AWS MCP Servers, Amazon Q CLI gains superpowers to complete the project.

This is my prompt strategy:

Your role: You are a Solutions Architect who wants to build a multi-modal Chatbot using Modern web frameworks that interacts with text and image models from Hugging Face. If Hugging Face API key is needed, use hf.key which contains the key. You store all final yaml files in sub-folder called 'deployments'. You store all documentation in sub-folder 'documentation'. You make use of necessary MCP servers and tools for these tasks. For all interactions with the EKS cluster, you use EKS MCP server. Plan your tasks so that each step and status is recorded, allowing resuming tasks if there are any interruptions. Do not build anything. Wait for my step by step instructions.

Step 1: Read AWS documentation to understand to how EKS Auto Mode works, specifically how it manages Node Pools. Then, create a cluster called ai-chatbot with EKS Auto Mode enabled. Create a node pool consisting of different g6 and p5 GPU instances. I will use the GPU node pools for LLM inference in the next steps. The inference engine will be vLLM. Prepare appropriate Storage services for LLM storage and fast access.

Step 2: Deploy the llama3.1 8B model (see https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on GPU g6.12xlarge instance using vLLM serving engine. The deployment should have 1 pod. Troubleshoot and correct any errors if any. Check the deployment regularly until it is fully deployed. After deployment, test the model. Consider that the cluster has EKS Auto Mode enabled in your deployment strategy.

Step 3: Deploy the black-forest-labs/FLUX.1-dev model (see https://huggingface.co/black-forest-labs/FLUX.1-dev) on a GPU p5 instance. The deployment should have 1 pod. Troubleshoot and correct any errors if any. Check the deployment regularly until it is fully deployed. After deployment, test the model by generating an image. Consider that the cluster has EKS Auto Mode enabled in your deployment strategy.

Step 4: Deploy a front-end application (e.g. Streamlit) using general-purpose compute making sure that they have access to both models deployed in the steps above. Use EFS for persistent storage. Store conversation history and make it accessible in the sidebar.

Here's how Q CLI revolutionized my approach:

1. Intelligent Architecture Planning

Instead of spending hours researching optimal instance types and configurations, I simply asked Q CLI:

"What's the best GPU instance type for running FLUX.1-dev image generation model?"

Q CLI immediately provided detailed analysis:

P4d.24xlarge for A100 GPUs (optimal for FLUX)
G6.12xlarge for L4 GPUs (cost-effective for LLaMA)
Spot instance strategies with 85% cost savings
Memory and compute requirements for each model

Result: Saved 2-3 days of research and testing different configurations.

2. Real-Time Troubleshooting

When my FLUX deployment was stuck in "Pending" status, instead of diving into logs manually, I asked:

"My FLUX pod is pending. Can you help diagnose the issue?"

Q CLI instantly:

Analyzed pod events and node availability
Identified GPU node provisioning delays
Suggested Karpenter configuration optimizations
Provided specific commands to monitor progress

The magic moment: Q CLI detected that my readiness probe had a 10-minute initial delay, explaining why the service appeared "offline" when it was actually loading successfully.

3. Configuration Generation and Optimization

Rather than writing YAML manifests from scratch, Q CLI helped generate production-ready configurations:

"Generate a Kubernetes deployment for LLaMA 3.1 8B with vLLM on G6 instances"

Q CLI produced:

Optimized resource requests and limits
Proper GPU scheduling and tolerations
Health checks with appropriate timeouts
Security best practices built-in

Time saved: What would have taken hours of documentation reading and trial-and-error became minutes of AI-assisted configuration.

🔧 Real-World Problem Solving with Q CLI

Challenge 1: LoadBalancer Connectivity Issues

Problem: Users couldn't access the Streamlit frontend via LoadBalancer URL.

Q CLI Assistance:

"The LoadBalancer URL is not accessible. Can you help troubleshoot?"

Q CLI systematically:

Checked LoadBalancer configuration and security groups
Identified DNS propagation issues
Provided alternative access methods (port forwarding, NodePort)
Generated troubleshooting commands for validation

Outcome: Multiple access paths established, ensuring users could always reach the application.

Challenge 2: GPU Resource Allocation

Problem: FLUX model wasn't utilizing A100 GPU efficiently.

Q CLI Guidance:

"How can I optimize FLUX.1-dev performance on A100 GPUs?"

Q CLI recommended:

Specific CUDA memory allocation configurations
PyTorch optimization flags
Batch size and inference step tuning
Memory management best practices

Result: 40% improvement in image generation speed and GPU utilization.

Challenge 3: Service Integration

Problem: Streamlit frontend couldn't communicate with AI model services.

Q CLI Solution:

"My Streamlit app can't reach the AI services. Help me debug the connectivity."

Q CLI provided:

Service discovery troubleshooting commands
Network policy validation
DNS resolution testing
Step-by-step connectivity verification

Impact: Seamless service integration achieved in minutes instead of hours.

📊 The Results: A Production-Ready Multi-Modal AI System

Thanks to Q CLI's assistance, I successfully deployed:

🏗️ Architecture Achieved

Frontend: Streamlit web interface on cost-optimized compute
Text AI: LLaMA 3.1 8B on G6.12xlarge (L4 GPUs)
Image AI: FLUX.1-dev on P4d.24xlarge (A100 GPUs)
Infrastructure: Auto-scaling EKS with Karpenter and spot instances

📈 Performance Metrics

Text Generation: 2-5 seconds response time
Image Generation: 20-30 seconds for 1024x1024 images
Cost Optimization: 85% savings with spot instances
Availability: Multi-AZ deployment with health monitoring

🎯 User Experience

Multi-Modal Chat: Text, image, or combined responses
Real-Time Monitoring: Live service health status
Configurable Settings: Adjustable generation parameters
Multiple Access Methods: LoadBalancer, NodePort, port forwarding

🛠️ Q CLIs Game-Changing Capabilities

1. Contextual Problem Solving

Q CLI doesn't just provide generic answers—it understands your specific environment and provides targeted solutions.

2. Multi-Service Orchestration

Managing complex deployments across multiple services becomes manageable with Q CLI's holistic approach.

3. Best Practices Integration

Every suggestion incorporates AWS Well-Architected principles and Kubernetes best practices.

4. Real-Time Assistance

No more context switching between documentation, forums, and terminals—Q CLI provides immediate, relevant help.

5. Learning Accelerator

Q CLI doesn't just solve problems; it explains the reasoning, helping you learn and improve.

📚 The Documentation Revolution

One of Q CLI's most impressive features was helping me create comprehensive project documentation:

"Generate comprehensive documentation for this multi-modal AI project"

Q CLI produced:

17,000+ words of detailed documentation
Step-by-step deployment guides
Configuration references
Troubleshooting runbooks
Sample prompts and testing guides

The result: A complete package that anyone can use to redeploy the entire system.

🎯 Key Takeaways for Fellow Engineers

1. AI-Assisted Development is Here

Amazon Q CLI represents a fundamental shift in how we approach complex infrastructure projects. It's not replacing our expertise—it's amplifying it.

2. Time-to-Value Acceleration

What traditionally takes weeks of research, trial-and-error, and debugging can now be accomplished in days with AI assistance.

3. Quality and Best Practices

Q CLI doesn't just help you build faster—it helps you build better, with security, scalability, and cost optimization built-in.

4. Learning and Growth

The explanatory nature of Q CLI's responses makes it an excellent learning tool, helping you understand the "why" behind recommendations.

5. Documentation Excellence

AI-assisted documentation creation ensures comprehensive, accurate, and maintainable project artifacts.

🚀 The Future of Infrastructure Development

This project demonstrated that we're entering a new era where:

Complex deployments become accessible to more engineers
Best practices are automatically incorporated
Troubleshooting becomes collaborative with AI
Documentation is comprehensive and current
Learning curves are dramatically reduced

💡 Try It Yourself

If you're working on AWS infrastructure projects, I highly recommend exploring Amazon Q CLI. Whether you're:

Deploying AI workloads on EKS
Optimizing costs with spot instances
Troubleshooting complex service interactions
Creating documentation for your projects
Learning new AWS services and patterns

Amazon Q CLI can transform your development experience.

🔗 Project Artifacts

The complete multi-modal AI chatbot project, including all configurations, documentation, and deployment guides, is now available as a production-ready package. The system demonstrates:

Multi-modal AI capabilities (text + image generation)
Production-grade architecture with auto-scaling and monitoring
Cost optimization strategies with spot instances
Comprehensive documentation for redeployment

🎉 Final Thoughts

Building this multi-modal AI chatbot was an incredible journey, made possible by the power of Amazon Q CLI. It's not just about the technology we deployed—it's about how AI-assisted development is changing the game for all of us.

The future of infrastructure development is collaborative, intelligent, and incredibly exciting. Amazon Q CLI is leading that transformation.

What's your experience with AI-assisted development? Have you tried Amazon Q CLI for your AWS projects? I'd love to hear about your experiences and challenges in the comments!

#AWS #AmazonQ #EKS #AI #MachineLearning #Kubernetes #DevOps #CloudComputing #ArtificialIntelligence #Innovation

🔗 Connect with me to discuss AI infrastructure, cloud architecture, and the future of development tools. Always happy to share experiences and learn from the community!

Screenshots:

Peter Perbellini

Head of APJ Solutions Architecture, Enterprise Apps, Migrations and Modernization (EAMM)

1mo

Love this...

2 Reactions

Khan Asif Azad , CISSP

Senior Solutions Consultant @ VMware | Cisco Business Architecture Analyst | Multi-Cloud Architect | Baker

Nice one Hantzley Tauckoor ! Keep it up

Sanjay Bhatnagar

Helping Indian SME Manufacturers Unlock Hidden Revenue | Business Model Innovation Expert | IIT Delhi Alumnus.

Thanks for sharing, Hantzley

1 Reaction

Donghyuk Shin

GenAI + Digital Engagement | 0→1 to 10x Business Leader Scaling APJC | GTM & Product-Led Growth at AWS

Really well built, Hantzley Tauckoor. Insightful and engaging from start to finish.

See more comments