A sophisticated content recommendation system powered by VMware Tanzu Greenplum, pgvector, and AI-generated embeddings. This project demonstrates how to build a complete AI-driven recommendation pipeline with a stunning modern web interface featuring real-time clustering and metallic design.
- Glass morphism design with metallic gradients and effects
- Interactive cluster rating system with custom metallic sliders
- Real-time re-clustering with adjustable parameters (5-25 clusters)
- Responsive design that works beautifully on all devices
- Dark theme with gold accents and shimmer animations
- Embedding generation using configurable AI models (OpenAI or local)
- KMeans clustering with PL/Python and scikit-learn
- Vector similarity search using pgvector for recommendations
- AI-generated cluster summaries for better user understanding
- Personalized preference modeling based on cluster ratings
- Dynamic re-clustering - Change cluster count and regenerate groupings
- Configurable embedding dimensions (supports 768D, 1536D, etc.)
- Local or cloud AI models via OpenAI-compatible APIs
- Export functionality for recommendations and analytics
- Comprehensive pre-flight checks for data integrity
- Greenplum Database 7.x with pgvector extension
- Python 3.8+ with virtual environment support
- AI Model Access (OpenAI API or local model server)
-
Clone and setup:
git clone <repository-url> cd blog-recommendations python3 -m venv venv source venv/bin/activate pip install -r requirements.txt
-
Download the dataset:
# Easy one-click download ./download_data.sh
Alternative manual download:
# Manual download and extract curl -o medium_data.csv.zip "https://storage.googleapis.com/kaggle-data-sets/748442/1294572/compressed/medium_data.csv.zip?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=gcp-kaggle-com%40kaggle-161607.iam.gserviceaccount.com%2F20250915%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20250915T195945Z&X-Goog-Expires=259200&X-Goog-SignedHeaders=host&X-Goog-Signature=2641e2acbb04a08f983895bca92680cf78ecc5a4cb273a5b78d249e3b8a780e7da9481fd482a56947908916fdaa3f66dca87644ebc4f3d20a93d16ba18ecd865a60db000af100e0c328a7f028dc30700c23a9fccc34027d187402393331624ca3e45fc5b9d64cdf9dbe84f7774e0bdfcbbff28f1dd12ee662ff112d0fc082776d8b5b57cf5f69892887b03a71a742565f8d79c7149cedb9530e96a4c9bab8ec051d9505105b7c56da4ba145bcea0f9df48e80dae9f6c0e241fe741a119f9c01206974f8c36a5c123be324428df89be0f0024ea7d52e374409e15529a75ae4e70da7ee756a18b3eb8da6b18a9abb3b5515c8afffa81c753301a711947a4005107" unzip medium_data.csv.zip mv medium_data.csv medium_post_titles.csv rm medium_data.csv.zip
-
Configure environment (edit
.env
):# Database Configuration GP_HOST=localhost GP_PORT=15432 GP_USER=gpadmin DB_NAME=demo # AI Model Configuration USE_LOCAL_MODELS=true LOCAL_API_BASE=http://127.0.0.1:1234/v1 EMBEDDING_MODEL=text-embedding-nomic-embed-text-v2 EMBEDDING_DIMENSIONS=768 # Web Interface WEB_PORT=8081
-
Run the complete demo:
./run_demo.sh
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Raw CSV Data β -> β Greenplum + AI β -> β Web Interface β
β (Blog Posts) β β Processing β β (Metallic UI) β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
βΌ
ββββββββββββββββββββ
β Recommendation β
β Engine β
ββββββββββββββββββββ
- Load β Blog posts imported into Greenplum
- Embed β AI generates vector embeddings for each post
- Cluster β KMeans groups similar content (configurable clusters)
- Summarize β AI creates thematic summaries for each cluster
- Recommend β Vector similarity finds personalized content
- View AI-generated cluster summaries with sample articles
- Adjust cluster count (5-25) using the metallic slider
- Click "Re-cluster Data" to dynamically reorganize content
- Rate each cluster from 1-10 using metallic sliders
- Build your personalized preference vector in real-time
- Visual feedback with gold accent colors
- Get top 25 most interesting articles for you
- See 25 least relevant articles (for comparison)
- Export results with metadata and system information
Component | Technology | Purpose |
---|---|---|
Database | Greenplum 7.x | Scalable data warehouse |
Vector Search | pgvector | Similarity search & storage |
Clustering | PL/Python + scikit-learn | KMeans clustering |
AI Models | OpenAI API / Local LLMs | Embeddings & summaries |
Backend | Flask + Python | REST API server |
Frontend | HTML/CSS/JS + Tailwind | Metallic web interface |
Config | Environment variables | Flexible configuration |
blog-recommendations/
βββ π web_app.py # Flask web server with REST API
βββ βοΈ config.py # Configuration management
βββ π§ run_demo.sh # One-click demo launcher
βββ π genvec.py # Embedding generation
βββ π― cluster.sql # KMeans clustering query
βββ π summarize.py # AI cluster summaries
βββ ποΈ load_data.sh # Data pipeline setup
βββ π§ create_kmeans_function.sql # PL/Python KMeans UDF
βββ π generate_schema.py # Dynamic schema generation
βββ π templates/
β βββ index.html # Metallic UI template
βββ π± static/
β βββ js/app.js # Frontend JavaScript
βββ π scripts/ # Utility scripts
βββ πΈ screenshots/ # Demo screenshots
// Via Web UI - adjust slider and click "Re-cluster Data"
// Or via API:
curl -X POST http://localhost:8081/api/recluster \
-H "Content-Type: application/json" \
-d '{"num_clusters": 20}'
# In .env file:
EMBEDDING_MODEL=text-embedding-3-large
EMBEDDING_DIMENSIONS=3072
# Complete pipeline:
bash load_data.sh # Load CSV data
python genvec.py # Generate embeddings
psql -f create_kmeans_function.sql demo # Create KMeans function
psql -f cluster.sql demo # Run clustering
python summarize.py # Generate summaries
python web_app.py # Start web interface
GP_HOST
,GP_PORT
,GP_USER
- Database connection parametersDB_NAME
- Target database name
USE_LOCAL_MODELS
- Toggle between local and cloud AI modelsEMBEDDING_MODEL
- Model for generating embeddingsEMBEDDING_DIMENSIONS
- Vector dimensions (768, 1536, 3072, etc.)CHAT_MODEL
- Model for generating cluster summaries
WEB_PORT
- Flask server port (default: 8081)CLUSTER_SAMPLE_SIZE
- Articles per cluster for summaries
The metallic theme uses CSS custom properties for easy customization:
/* Metallic color scheme */
.metallic-bg { /* Silver gradients */ }
.metallic-blue { /* Blue metallic accents */ }
.metallic-gold { /* Gold highlights */ }
.glass-panel { /* Glassmorphism effects */ }
- Vector Search: pgvector provides fast similarity search at scale
- Clustering: PL/Python enables in-database ML processing
- Caching: Configuration display caching for faster startups
- Responsive: UI adapts from mobile to large displays
- Media & Publishing: Personalized article feeds and content discovery
- E-commerce: Product recommendations based on user behavior
- Education: Adaptive learning content recommendations
- Research: Academic paper recommendation systems
- Corporate: Knowledge base content suggestion
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Test with your Greenplum setup
- Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Submit a pull request
This project demonstrates advanced Greenplum + AI capabilities for educational and commercial use.
Built with β€οΈ using Greenplum, pgvector, and modern web technologies
Showcasing the power of in-database AI and vector search for next-generation recommendation systems