SlideShare a Scribd company logo
Fighting Financial Crime
with Artificial Intelligence
Tim Seears
Area Practice Director, Data Science – Asia Pacific & Korea
2
So far we have delivered
150+ successful projects for
100+ clients worldwide.
500+ employees
(2017)
Vendor-neutral with an
open source focus.
Full spectrum consulting, data
engineering, data science &
support.
Apache
Hadoop
and cloud
ecosystem
integration.
Founded in 2010
industry thought
leader.
Fixed fee offerings
for data science and
engineering.
Who Is Think Big?
1st Big Data provider 100%
focused around open source.
3
Help customers be financially confident
and achieve their ambitions by making
daily banking and important financial
decisions easy.
19,000
employees
(2017)
1,800 corporate and
institutional customers
Covering Personal Banking, Business
Banking, Corporates & Institutions and
Wealth Management.
2.7 million
personal
customers
236,000 small and
medium-sized
business customers
Making banking
easier for over 145
years.
Who is the customer?
Leading Nordic universal bank with strong
local roots and bridges to the rest of the
world
4
PayPal WalmartJohn Deere Lowe’s Wells FargoJP Morgan
This is Happening in the Enterprise today
5
“
’’
Over the next decade, AI won’t replace
managers, but managers who use AI
will replace those who don’t.
HBR July 2017
7
Data Driven Approach to Fight Fraud
Fast evolving fraud sophistication, AI
Ambitions for Fraud Project
Become client’s advanced
analytics blueprint
Data driven approach to real
time scoring of transactions
Reduce false-positives &
Enhance fraud detection rate
ONLY ~40%
of fraud cases are detected
Low Detection Rate
99.5%
of cases are not
fraud related
Many false positives
Challenges for Fraud Detection
Tens of Millions
€ lost each month
High Fraud Loss
© 2017 Teradata
8
Fraud Types – Customer Initiated
Nigeria/
Investment Scam
CEO
Fraud
Customer
Initiated
© 2017 Teradata
Beneficiary Account
Change
Fake
Invoice
Rental Scam/
Goods Not
Received
9
ID Theft
SPEAR
Phishing
Vishing/Support Scam
Malware
Phishing/Smishing
Fraud Types – Fraudster Initiated
Fraudster
Initiated
© 2017 Teradata
10
Modeling
Challenges
© 2017 Teradata
• Class imbalance
(100,000:1 non-fraud vs. fraud)
• Assigning fraud labels from
historic data
• Fraud is ambiguous
• Not all features available in
real-time
• Most machine learning sees
transactions atomically
11
Advanced Platform for Fraud: Data-Driven Approach
How It Works:
• Understanding the domain
• Gathering and preparing the data
• Automatically generating the rules and recognizing fraudulent patterns by
training models on historical data
• Automatically maintaining the engine by retraining the model
Pros:
• Automatic/data-driven/objective
inference of the rules
• Ability to detect patterns in a high
dimensional data input
• Fast detection of new/changing
fraudulent patterns
Cons:
• Might be unintuitive and hardly
interpretable
• Data preparation and feature aggregation
is time consuming
© 2017 Teradata
12
Phase One
© 2017 Teradata
13
Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul
Project kick-off
30th September
Teamwork to Deliver Value in Each Project Iteration
2016 2017
Kick-off: Data
Scientist track
Go-Live plan set
for the project
19th December
Kick-off:
Engineering track
Models successfully in
Shadow Production
4th March
First round trip
test transaction
First production
virtual machine
Production Hardening
for HA and Security
Full productionisation of
the Fraud Engine
Cross-
functional
Collaboration
Cont. Deep
Learning
modeling
© 2017 Teradata
14
>
Banking
Anti-Fraud
Solution
© 2017 Teradata
Banking Anti-Fraud Solution
By leveraging the power of a
thoughtful and strong data
and analytics strategy, we
unleash high impact business
outcomes.
• Multiple models running in production
at the same time
• Mix of traditional and advance deep
learning methods
• AnalyticsOps: Deploying machine
learning models in production
Model Management
Framework
• Organisation and silos of data
• Real-time data integration
• Security and Procedures: following
existing bank procedures
Data Modelling, Pipeline
and Ingestion
• Hard to operationalize insights
• Availability of analytic capabilities/ skills
and data
• Interpreting the results of machine
learning models
Machine Learning and
Artificial Intelligence
15
Advanced Platform for Fraud: Data-Driven Approach
Manual
Evaluation
eBanking
Mobile
Pay
Business
Online
Global Payment
Interface
Fraud?
Create
Verification
Fraud
Payment
Weblog Data
Basic Customer
Data
Historic
Transactions
Customer Product
Data
Aggregated
Customer Data
Fraud?
Advanced Analytics
Platform for Fraud
Central Fraud Engine
Process Payment to
Beneficiary
Yes/Maybe
No
Yes !
Return Payment
to Customer
Update
Fraud Data
© 2017 Teradata
No
16
Model Management
Real Time Scoring
System
Production
Operations
Source
Systems
Data
Managers
Framework to Enhance AnalyticsOps Capabilities
Log
Transactions
Add Historic
Data
Model
Performance
Promote
Models
© 2017 Teradata
Data Scientists
Development & Deployment
Management
Data Science Lab
Data Analysis &
Model Development
Data
Management
Developers &
Release Managers
17
Key Requirement: Model Interpretation
• We have deployed LIME (Locally Interpretable Model Explanation) for
customers
– Improves trust
– Compliance with EU’s General Data Protection Regulation (GDPR)
© 2017 Teradata
17.6% Fraud Probability
Customer Amount
Debit Amount
Avg. # Trans Cred Acc
# Prev to This Dest
# Xfer Accounts
X% score due to:
+ transfer amount
+ destination country
+ last year monthly spend
What features are most important to
this decision?
18
Machine Learning Results
(Live System: 60 transactions/sec.)
Ensemble of boosted decision trees and
logistic regression.
From online validation of the model:
● 25-30% false positive reduction, with
over 35% increase in detection rate
● Opportunity to expand model with
additional features, retrain on recent data
and add additional models to
the ensemble.
● Models can be expanded to additional
channels
Rule Engine on
validation set
© 2017 Teradata
19
Deep Learning
© 2017 Teradata
20
Current models can
only catch ~70% of all
fraud cases
Deep Learning Opportunity
Traditional ML models view
transactions atomically
Often missed fraud
transactions are part of
a series
Capturing correlation
across many features
© 2017 Teradata
21
Three Deep Learning Architectures to Deliver Value
• Designed for spatial
correlated features, but
by transforming
transactions into a
2D image, we can
learn temporal
correlated features.
• Deeper ConvNet allows
learning more complex
& general features.
Goal: Learn kernels from
temporal & static features
to gain insight into the
characteristics of fraud.
• Learn temporal
information and classify
if the sequence of
transactions
contains fraud.
• Shares knowledge
across learning time.
Goal: Learn transaction
patterns within a window.
Two solutions can be
tested: flag fraud or predict
next transaction and define
an error.
• Learn how to generate
normal transactions,
potentially large volumes
of non-fraud data.
• AE provide a low level
representation of the
data.
Goal: Build a model that
learns how to generate
non-fraud data. To detect
fraud, define a
reconstruction error rate for
the fraud cases
Auto-Encoders
LSTM
ConvNet
© 2017 Teradata
22
How Can We Create an Image From Bank Transactions?
t0 X_0, X_1, ... X_n
dt
t1 X_0, X_1, ... X_n
t2 X_0, X_1, ... X_n
ts X_0, X_1, ... X_n
...
Top k Features Correlation
...
X_0
X_41 X_5
X_30
X_29X_31X_10
X_37
X_3
X_1
X_42 X_40
X_32
X_15X_35X_2
X_16
X_31
X_2
X_3 X_15
X_4
X_1X_28X_40
X_31
X_49
X_n
X_26 X_9
X_40
X_35X_28X_2
X_17
X_1
...
X_0
X_41 X_5
X_30
X_29X_31X_10
X_37
X_3
X_1
X_42 X_40
X_32
X_15X_35X_2
X_16
X_31
X_2
X_3 X_15
X_4
X_1X_28X_40
X_31
X_49
X_n
X_26 X_9
X_40
X_35X_28X_2
X_17
X_1
...
...
...
...
...
X_0
X_41 X_5
X_30
X_29X_31X_10
X_37
X_3
X_1
X_42 X_40
X_32
X_15X_35X_2
X_16
X_31
X_2
X_3 X_15
X_4
X_1X_28X_40
X_31
X_49
X_n
X_26 X_9
X_40
X_35X_28X_2
X_17
X_1
N
dt
t0
t1
ts
Input Output
Raw Features
Add correlated features in a
clock-wise manner
© 2017 Teradata
Image size is:
[10 x 3, 50 x 3, 1]
23
Convolutional Layers for Trans2D
Kernels
of 3x3
...X_0 X_1 X_2 X_n
...X_0 X_1 X_2 X_n...
...
...
...
...X_0 X_1 X_2 X_n
Features
Time
Strides of 3
Strides
of 1
First Convolutional Layer Architecture
© 2017 Teradata
24
2D Transaction Image Example
© 2017 Teradata
Non-fraud Transaction Image
Non-fraud
Fraud Transaction Image
X-axis: features, Y-axis: time
Fraud
Non-fraud
25
Network Architecture for CNNs
Fraud
Normal
50
30
25
15
13
8CNN
Fraud
Normal50
30
25
15
25
15
25
15
25
15
© 2017 Teradata
26
Inside the ResNet model
64 Filters
Activations After
the CNN
Residual Blocks
FraudNon-fraud
© 2017 Teradata
27
Deep Learning
First Results
on the fraud verification dataset
Comparison of the three deep learning
models andthe traditional machine
learning ensemble model.
© 2017 Teradata
• Ensemble model (AUC 0.89)
• ConvNets (AUC 0.95)
• LSTM (AUC 0.90)
• ResNet (AUC 0.94)
28
Lessons Learned: Take-Aways From client project
Deep
learning
adoption
from pictures
to financial
transactions
Enhancement of
data quality &
cluster
capabilities with
data ingestion
Building
Analytics
Ops
capabilities
to support
business
units
Leveraging
experience
from Fraud
advanced
analytics to
deliver extra
use cases
© 2017 Teradata
29
Success: From PowerPoint to production in 8 sprints
Team effort: Thorough collaboration across IMD, GFU and Think Big
Synergy: Successfully spearheaded innovation in all involved systems
Inspiration: Bank advanced analytics blueprint sets a generic scene for combatting new
challenges in advanced analytics
Agile influence: Using an agile approach we were able to quickly deliver within the
challenging timeframe.
© 2017 Teradata
Big Data Team – Lessons Learned
3030 © 2017 Teradata
31
Appendix
© 2017 Teradata
32
Advanced Analytics Fraud Platform
• Develop a scalable and expandable platform which follows the Bank’s blueprint of
digitalization
• 100 % data-driven approach to find patterns in the data and complement the
existing fraud engine
• Use Hadoop to handle the large data volumes for training models on transaction
data
• Implement a real-time solution that can score live transactions such as
e-banking, credit card and mobile payments.
• Reduce amount of false-positives by at least 20-40 %
© 2017 Teradata
33
Ambitions for the Advance Analytics project
• The goal is to enhance fraud detection and
reduce false positives across products.
• Fraud is only the first of many possible
advanced analytics use cases.
• The project leads to the creation of the Bank’s
advanced analytics blueprint.
• Bank’s ambition is to become one of the bank
leading in advanced analytics capabilities.
Challenges in Fraud Detection at client
• Low detection rate: Existing human-written
rule engine catches ~40% of fraud cases.
• Many false positives: 99.5% of all cases
investigated are not fraud related.
• High fraud loss: Tens of millions of € total
fraud per month.
• Mobile payments are quickly growing in
number.
• Fraud evolving rapidly, with increased
sophistication: Bank must modernize its anti-
fraud arsenal.
© 2017 Teradata

More Related Content

PPTX
Remote Sensing with a Drone
PPTX
L6 Digital Forensic Investigation Tools.pptx
PPTX
PPT on Artificial Intelligence(A.I.)
PDF
Unlocking the Power of Generative AI An Executive's Guide.pdf
PDF
Listening Skills & Empathic Communication
PPT
Customer Service Powerpoint
PDF
ATM CASH REPLENISHMENT
PPTX
Security Policies and Standards
Remote Sensing with a Drone
L6 Digital Forensic Investigation Tools.pptx
PPT on Artificial Intelligence(A.I.)
Unlocking the Power of Generative AI An Executive's Guide.pdf
Listening Skills & Empathic Communication
Customer Service Powerpoint
ATM CASH REPLENISHMENT
Security Policies and Standards

What's hot (20)

PPTX
Artificial Intelligence and Digital Banking - What about fraud prevention ?
PPSX
Next-Gen security operation center
PPTX
IBM Security QRadar
PPTX
Machine Learning for Threat Detection
PPTX
Cyber security fundamentals
PDF
Combating Cyber Security Using Artificial Intelligence
PDF
Detecting fraud with Python and machine learning
PPTX
Zero Trust
PDF
Current Trends in Fraud Prevention
PDF
Cyber fraud in banks
PPTX
What is GRC – Governance, Risk and Compliance
PPTX
Artificial Intelligence and Cybersecurity
PPTX
Check Point vs competition security effectiveness
PDF
Ransomware attacks
PPTX
Understanding blockchain
PPTX
ANTI MONEY LAUNDERING REGULATIONS, UAE
PDF
Enterprise Cybersecurity: From Strategy to Operating Model
PPTX
Information Security Metrics - Practical Security Metrics
PDF
HOW AI CAN HELP IN CYBERSECURITY
PDF
MITRE ATT&CKcon 2.0: Using Threat Intelligence to Focus ATT&CK Activities; Da...
Artificial Intelligence and Digital Banking - What about fraud prevention ?
Next-Gen security operation center
IBM Security QRadar
Machine Learning for Threat Detection
Cyber security fundamentals
Combating Cyber Security Using Artificial Intelligence
Detecting fraud with Python and machine learning
Zero Trust
Current Trends in Fraud Prevention
Cyber fraud in banks
What is GRC – Governance, Risk and Compliance
Artificial Intelligence and Cybersecurity
Check Point vs competition security effectiveness
Ransomware attacks
Understanding blockchain
ANTI MONEY LAUNDERING REGULATIONS, UAE
Enterprise Cybersecurity: From Strategy to Operating Model
Information Security Metrics - Practical Security Metrics
HOW AI CAN HELP IN CYBERSECURITY
MITRE ATT&CKcon 2.0: Using Threat Intelligence to Focus ATT&CK Activities; Da...
Ad

Similar to Fighting Financial Crime with Artificial Intelligence (20)

PDF
Fighting financial fraud at Danske Bank with artificial intelligence
PDF
CTO Radshow Hamburg17 - Keynote - The CxO responsibilities in Big Data and AI...
PDF
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
PPTX
Building for the future of AI and Machine Learning at scale
PDF
Data & Analytic Innovations: 5 lessons from our customers
PDF
TDWI 17 Munich - Are enterprises ready for the 4th industrial revolution? - S...
PDF
Graph+AI for Fin. Services
PPTX
Emerging opportunities in the age of data
PDF
20181129 keynote augmented intelligence and artificial intelligence
PPTX
Webinar - Fighting Bank Fraud with Real-time Graph Database
PPTX
Self-Service Data Science for Leveraging ML & AI on All of Your Data
PPTX
[DSC Europe 24] Tatjana Pejcinovic Petra Sinka Guardian.pptx
PPTX
The Journey to Big Data Analytics
PDF
Taming Big Data With Modern Software Architecture
PDF
Big Data LDN 2017: The New Dominant Companies Are Running on Data
PDF
Big Data LDN 2017: The New Dominant Companies Are Running on Data
PPTX
The new dominant companies are running on data
PPTX
Why Everything You Know About bigdata Is A Lie
PPTX
The Big Data Revolution: The Next Generation of Finance
PDF
Accountex UK keynote_2016_May 9 2016
Fighting financial fraud at Danske Bank with artificial intelligence
CTO Radshow Hamburg17 - Keynote - The CxO responsibilities in Big Data and AI...
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
Building for the future of AI and Machine Learning at scale
Data & Analytic Innovations: 5 lessons from our customers
TDWI 17 Munich - Are enterprises ready for the 4th industrial revolution? - S...
Graph+AI for Fin. Services
Emerging opportunities in the age of data
20181129 keynote augmented intelligence and artificial intelligence
Webinar - Fighting Bank Fraud with Real-time Graph Database
Self-Service Data Science for Leveraging ML & AI on All of Your Data
[DSC Europe 24] Tatjana Pejcinovic Petra Sinka Guardian.pptx
The Journey to Big Data Analytics
Taming Big Data With Modern Software Architecture
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
The new dominant companies are running on data
Why Everything You Know About bigdata Is A Lie
The Big Data Revolution: The Next Generation of Finance
Accountex UK keynote_2016_May 9 2016
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded (20)

PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Empathic Computing: Creating Shared Understanding
PDF
Modernizing your data center with Dell and AMD
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Cloud computing and distributed systems.
PDF
Approach and Philosophy of On baking technology
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Empathic Computing: Creating Shared Understanding
Modernizing your data center with Dell and AMD
NewMind AI Weekly Chronicles - August'25 Week I
Unlocking AI with Model Context Protocol (MCP)
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Review of recent advances in non-invasive hemoglobin estimation
Cloud computing and distributed systems.
Approach and Philosophy of On baking technology
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
20250228 LYD VKU AI Blended-Learning.pptx
Network Security Unit 5.pdf for BCA BBA.
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Digital-Transformation-Roadmap-for-Companies.pptx

Fighting Financial Crime with Artificial Intelligence

  • 1. Fighting Financial Crime with Artificial Intelligence Tim Seears Area Practice Director, Data Science – Asia Pacific & Korea
  • 2. 2 So far we have delivered 150+ successful projects for 100+ clients worldwide. 500+ employees (2017) Vendor-neutral with an open source focus. Full spectrum consulting, data engineering, data science & support. Apache Hadoop and cloud ecosystem integration. Founded in 2010 industry thought leader. Fixed fee offerings for data science and engineering. Who Is Think Big? 1st Big Data provider 100% focused around open source.
  • 3. 3 Help customers be financially confident and achieve their ambitions by making daily banking and important financial decisions easy. 19,000 employees (2017) 1,800 corporate and institutional customers Covering Personal Banking, Business Banking, Corporates & Institutions and Wealth Management. 2.7 million personal customers 236,000 small and medium-sized business customers Making banking easier for over 145 years. Who is the customer? Leading Nordic universal bank with strong local roots and bridges to the rest of the world
  • 4. 4 PayPal WalmartJohn Deere Lowe’s Wells FargoJP Morgan This is Happening in the Enterprise today
  • 5. 5 “ ’’ Over the next decade, AI won’t replace managers, but managers who use AI will replace those who don’t. HBR July 2017
  • 6. 7 Data Driven Approach to Fight Fraud Fast evolving fraud sophistication, AI Ambitions for Fraud Project Become client’s advanced analytics blueprint Data driven approach to real time scoring of transactions Reduce false-positives & Enhance fraud detection rate ONLY ~40% of fraud cases are detected Low Detection Rate 99.5% of cases are not fraud related Many false positives Challenges for Fraud Detection Tens of Millions € lost each month High Fraud Loss © 2017 Teradata
  • 7. 8 Fraud Types – Customer Initiated Nigeria/ Investment Scam CEO Fraud Customer Initiated © 2017 Teradata Beneficiary Account Change Fake Invoice Rental Scam/ Goods Not Received
  • 8. 9 ID Theft SPEAR Phishing Vishing/Support Scam Malware Phishing/Smishing Fraud Types – Fraudster Initiated Fraudster Initiated © 2017 Teradata
  • 9. 10 Modeling Challenges © 2017 Teradata • Class imbalance (100,000:1 non-fraud vs. fraud) • Assigning fraud labels from historic data • Fraud is ambiguous • Not all features available in real-time • Most machine learning sees transactions atomically
  • 10. 11 Advanced Platform for Fraud: Data-Driven Approach How It Works: • Understanding the domain • Gathering and preparing the data • Automatically generating the rules and recognizing fraudulent patterns by training models on historical data • Automatically maintaining the engine by retraining the model Pros: • Automatic/data-driven/objective inference of the rules • Ability to detect patterns in a high dimensional data input • Fast detection of new/changing fraudulent patterns Cons: • Might be unintuitive and hardly interpretable • Data preparation and feature aggregation is time consuming © 2017 Teradata
  • 12. 13 Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Project kick-off 30th September Teamwork to Deliver Value in Each Project Iteration 2016 2017 Kick-off: Data Scientist track Go-Live plan set for the project 19th December Kick-off: Engineering track Models successfully in Shadow Production 4th March First round trip test transaction First production virtual machine Production Hardening for HA and Security Full productionisation of the Fraud Engine Cross- functional Collaboration Cont. Deep Learning modeling © 2017 Teradata
  • 13. 14 > Banking Anti-Fraud Solution © 2017 Teradata Banking Anti-Fraud Solution By leveraging the power of a thoughtful and strong data and analytics strategy, we unleash high impact business outcomes. • Multiple models running in production at the same time • Mix of traditional and advance deep learning methods • AnalyticsOps: Deploying machine learning models in production Model Management Framework • Organisation and silos of data • Real-time data integration • Security and Procedures: following existing bank procedures Data Modelling, Pipeline and Ingestion • Hard to operationalize insights • Availability of analytic capabilities/ skills and data • Interpreting the results of machine learning models Machine Learning and Artificial Intelligence
  • 14. 15 Advanced Platform for Fraud: Data-Driven Approach Manual Evaluation eBanking Mobile Pay Business Online Global Payment Interface Fraud? Create Verification Fraud Payment Weblog Data Basic Customer Data Historic Transactions Customer Product Data Aggregated Customer Data Fraud? Advanced Analytics Platform for Fraud Central Fraud Engine Process Payment to Beneficiary Yes/Maybe No Yes ! Return Payment to Customer Update Fraud Data © 2017 Teradata No
  • 15. 16 Model Management Real Time Scoring System Production Operations Source Systems Data Managers Framework to Enhance AnalyticsOps Capabilities Log Transactions Add Historic Data Model Performance Promote Models © 2017 Teradata Data Scientists Development & Deployment Management Data Science Lab Data Analysis & Model Development Data Management Developers & Release Managers
  • 16. 17 Key Requirement: Model Interpretation • We have deployed LIME (Locally Interpretable Model Explanation) for customers – Improves trust – Compliance with EU’s General Data Protection Regulation (GDPR) © 2017 Teradata 17.6% Fraud Probability Customer Amount Debit Amount Avg. # Trans Cred Acc # Prev to This Dest # Xfer Accounts X% score due to: + transfer amount + destination country + last year monthly spend What features are most important to this decision?
  • 17. 18 Machine Learning Results (Live System: 60 transactions/sec.) Ensemble of boosted decision trees and logistic regression. From online validation of the model: ● 25-30% false positive reduction, with over 35% increase in detection rate ● Opportunity to expand model with additional features, retrain on recent data and add additional models to the ensemble. ● Models can be expanded to additional channels Rule Engine on validation set © 2017 Teradata
  • 19. 20 Current models can only catch ~70% of all fraud cases Deep Learning Opportunity Traditional ML models view transactions atomically Often missed fraud transactions are part of a series Capturing correlation across many features © 2017 Teradata
  • 20. 21 Three Deep Learning Architectures to Deliver Value • Designed for spatial correlated features, but by transforming transactions into a 2D image, we can learn temporal correlated features. • Deeper ConvNet allows learning more complex & general features. Goal: Learn kernels from temporal & static features to gain insight into the characteristics of fraud. • Learn temporal information and classify if the sequence of transactions contains fraud. • Shares knowledge across learning time. Goal: Learn transaction patterns within a window. Two solutions can be tested: flag fraud or predict next transaction and define an error. • Learn how to generate normal transactions, potentially large volumes of non-fraud data. • AE provide a low level representation of the data. Goal: Build a model that learns how to generate non-fraud data. To detect fraud, define a reconstruction error rate for the fraud cases Auto-Encoders LSTM ConvNet © 2017 Teradata
  • 21. 22 How Can We Create an Image From Bank Transactions? t0 X_0, X_1, ... X_n dt t1 X_0, X_1, ... X_n t2 X_0, X_1, ... X_n ts X_0, X_1, ... X_n ... Top k Features Correlation ... X_0 X_41 X_5 X_30 X_29X_31X_10 X_37 X_3 X_1 X_42 X_40 X_32 X_15X_35X_2 X_16 X_31 X_2 X_3 X_15 X_4 X_1X_28X_40 X_31 X_49 X_n X_26 X_9 X_40 X_35X_28X_2 X_17 X_1 ... X_0 X_41 X_5 X_30 X_29X_31X_10 X_37 X_3 X_1 X_42 X_40 X_32 X_15X_35X_2 X_16 X_31 X_2 X_3 X_15 X_4 X_1X_28X_40 X_31 X_49 X_n X_26 X_9 X_40 X_35X_28X_2 X_17 X_1 ... ... ... ... ... X_0 X_41 X_5 X_30 X_29X_31X_10 X_37 X_3 X_1 X_42 X_40 X_32 X_15X_35X_2 X_16 X_31 X_2 X_3 X_15 X_4 X_1X_28X_40 X_31 X_49 X_n X_26 X_9 X_40 X_35X_28X_2 X_17 X_1 N dt t0 t1 ts Input Output Raw Features Add correlated features in a clock-wise manner © 2017 Teradata Image size is: [10 x 3, 50 x 3, 1]
  • 22. 23 Convolutional Layers for Trans2D Kernels of 3x3 ...X_0 X_1 X_2 X_n ...X_0 X_1 X_2 X_n... ... ... ... ...X_0 X_1 X_2 X_n Features Time Strides of 3 Strides of 1 First Convolutional Layer Architecture © 2017 Teradata
  • 23. 24 2D Transaction Image Example © 2017 Teradata Non-fraud Transaction Image Non-fraud Fraud Transaction Image X-axis: features, Y-axis: time Fraud Non-fraud
  • 24. 25 Network Architecture for CNNs Fraud Normal 50 30 25 15 13 8CNN Fraud Normal50 30 25 15 25 15 25 15 25 15 © 2017 Teradata
  • 25. 26 Inside the ResNet model 64 Filters Activations After the CNN Residual Blocks FraudNon-fraud © 2017 Teradata
  • 26. 27 Deep Learning First Results on the fraud verification dataset Comparison of the three deep learning models andthe traditional machine learning ensemble model. © 2017 Teradata • Ensemble model (AUC 0.89) • ConvNets (AUC 0.95) • LSTM (AUC 0.90) • ResNet (AUC 0.94)
  • 27. 28 Lessons Learned: Take-Aways From client project Deep learning adoption from pictures to financial transactions Enhancement of data quality & cluster capabilities with data ingestion Building Analytics Ops capabilities to support business units Leveraging experience from Fraud advanced analytics to deliver extra use cases © 2017 Teradata
  • 28. 29 Success: From PowerPoint to production in 8 sprints Team effort: Thorough collaboration across IMD, GFU and Think Big Synergy: Successfully spearheaded innovation in all involved systems Inspiration: Bank advanced analytics blueprint sets a generic scene for combatting new challenges in advanced analytics Agile influence: Using an agile approach we were able to quickly deliver within the challenging timeframe. © 2017 Teradata Big Data Team – Lessons Learned
  • 29. 3030 © 2017 Teradata
  • 31. 32 Advanced Analytics Fraud Platform • Develop a scalable and expandable platform which follows the Bank’s blueprint of digitalization • 100 % data-driven approach to find patterns in the data and complement the existing fraud engine • Use Hadoop to handle the large data volumes for training models on transaction data • Implement a real-time solution that can score live transactions such as e-banking, credit card and mobile payments. • Reduce amount of false-positives by at least 20-40 % © 2017 Teradata
  • 32. 33 Ambitions for the Advance Analytics project • The goal is to enhance fraud detection and reduce false positives across products. • Fraud is only the first of many possible advanced analytics use cases. • The project leads to the creation of the Bank’s advanced analytics blueprint. • Bank’s ambition is to become one of the bank leading in advanced analytics capabilities. Challenges in Fraud Detection at client • Low detection rate: Existing human-written rule engine catches ~40% of fraud cases. • Many false positives: 99.5% of all cases investigated are not fraud related. • High fraud loss: Tens of millions of € total fraud per month. • Mobile payments are quickly growing in number. • Fraud evolving rapidly, with increased sophistication: Bank must modernize its anti- fraud arsenal. © 2017 Teradata