Bank Customer Churn Demo with Machine Learning and AI as Needed

Ivan D. Novick

Product Manager for Data Products

Published Sep 14, 2025

Customer churn prediction is one of the classic problems in data science. Banks, telecoms, and subscription services all face the same question: which customers are most likely to leave, and why? Solving this helps organizations act early with retention strategies.

When I set out to build this demo, my original idea was to show a two-step approach:

Use native Greenplum machine learning capabilities with Apache MADlib.
Then layer on AI with more advanced models to show improvements.

But here’s the surprise: the Greenplum-native machine learning results were so strong that there was no need to continue to AI models. For this churn prediction dataset, a simple decision tree trained with Apache MADlib inside Greenplum gave excellent performance right out of the box.

How I Built It

The starting point was gpmlbot, a tool that tries multiple model families automatically, evaluates them with the dataset at hand, and recommends which ones are most likely to succeed. By running live experiments and generating SQL code, gpmlbot allowed me to very quickly converge on the right approach for churn prediction in Greenplum. gpmlbot tested several candidates including logistic regression, random forest, and decision trees, and it identified the decision tree classifier as the best match for this dataset.

I used a Kaggle dataset of churn prediction for a bank with sample data covering 10,000 example customers. This data was loaded into Greenplum in less than one second with a single SQL command referenced in my repository.

The way machine learning works in Greenplum is straightforward. A SQL command is executed to train a model, and the result is a model output table that can be applied to predict outcomes on new datasets. I validated the model by testing it against data that was not part of the training set. In this case, the accuracy of the decision tree model was over 99 percent, with 9,986 out of 10,000 predictions correct.

The Final Workflow

Workflow Step 1: Create the schema using bankchurn_schema.sql. Step 2: Load the dataset using load.sql. Step 3: Train the model using train_decisiontree.sql. Step 4: Validate the model using final_validation.sql. The repo is here if you want to give it a try: https://github.com/ivannovick/bankchurn

Closing Thoughts

What began as a plan to contrast Greenplum-native ML with more advanced AI techniques ended up proving something different: Greenplum’s in-database machine learning is powerful enough on its own for real-world business problems like churn prediction.

The combination of Apache MADlib’s algorithms, Greenplum’s parallelism, and gpmlbot’s ability to try different models, run rapid experiments, and generate code made solving this problem not only possible but efficient. gpmlbot’s recommendation of the decision tree classifier and the process of training, validating, and exporting that model showed how quickly churn prediction can be operationalized inside Greenplum. In the end, the native Greenplum decision tree performed so well that the AI layer wasn’t needed at all, though it remains available if future use cases that demand it.

Albert R.

Client Technical Specialist – QBE | Co-Founder | ex-Oracle, IBM, Google | Database & AI/ML Architect | Healthcare & GRC

Ivan D. Novick “native machine learning results were so strong that there was no need to continue to AI models”, No doubt. We saw the same when enhancing banking customer profiles off Park Avenue with third-party machine data bucketed with GP 4.x @ 12-20TB/hr using gpfdist for your load.sql

2 Reactions

See more comments

LinkedIn respects your privacy

Bank Customer Churn Demo with Machine Learning and AI as Needed

Ivan D. Novick

Product Manager for Data Products

How I Built It

The Final Workflow

Closing Thoughts

The Data Tells a Story

886 followers

More articles by this author

Explore content categories

How I Built It

The Final Workflow

Closing Thoughts

The Data Tells a Story

886 followers

Greenplum + AI: Personalized Blog Recommendations at Scale

Sep 10, 2025

Tanzu GemFire Memory Control: When to Expire Data, When to Evict It

Sep 3, 2025

Become a Greenplum Expert with ChatGPT AI — Learning the Smart Way

Sep 2, 2025

The Robots Are Coming: How AI in Physical Devices Will Reshape Human Work

Aug 29, 2025

Data Dissected — Breaking Down the Building Blocks of Information

Aug 10, 2025

And the winner is?

Aug 9, 2025

Codex helps me write code with AI

Jul 22, 2025

15 Best and Most Interesting PostgreSQL Extensions in 2025

Jul 20, 2025

What is Blockchain Technology?

Jul 20, 2025

Learning AI to Transform Yourself

Jul 14, 2025

Explore content categories