SlideShare a Scribd company logo
2nd Annual Machine Learning in Quantitative Finance
Synthetic Data Generation for Machine Learning in Finance
2020 Copyright QuantUniversity LLC.
Presented By:
Sri Krishnamurthy, CFA, CAP
QuantUniversity
9/1/2020
Powered by:
2
Speaker
• Quant, Data Science & ML practitioner
• Prior Experience at MathWorks, Citigroup
and Endeca and 25+ financial services and
energy customers.
• Columnist for the Wilmott Magazine
• Author of forthcoming book
“Pragmatic Machine Learning in Finance ”
• Teaches Data Science/AI at Northeastern
University, Boston
• Reviewer: Journal of Asset Management
Sri Krishnamurthy
Founder and CEO
QuantUniversity
3
About QuantUniversity
• Boston-based Data Science, Quant
Finance and Machine Learning
training and consulting advisory
• Trained more than 1000 students in
Quantitative methods, Data Science,
ML and Big Data Technologies
• Building a platform for
operationalizing AI and Machine
Learning in the Enterprise
4
1. Challenges with Real Datasets
2. Synthetic Dataset generation tools
▫ Proprietary
▫ Open Source
– Faker
– Data Synthesizer
– SDV
– Synthpop
– GANs
3. Demos
▫ VIX Data Generator
Agenda
Challenges with Real Datasets
6
7
Not be feasible to get samples for all categories
• Lighting conditions
• Modifications (Glasses/No glasses,
Moustache/ No Moustache etc.)
• Positions
Coverage
Challenges with real datasets
8
All scenarios haven’t
played out
• Stress scenarios
• What-if scenarios
Challenges with real datasets
Figure ref: http://www.actuaries.org/CTTEES_SOLV/Documents/StressTestingPaper.pdf
9
Missing values
• Missing at random
• Missing sequences
• Need data to fill frames
Challenges with real datasets
10
• Access
▫ Hard to find
▫ Rare class problems
▫ Privacy concerns
making it difficult to
share
Challenges with real datasets
11
Imbalanced
• Need more samples of rare
class
• Need proxies for data points
that were not observed or
recorded
Challenges with real datasets
12
Labels
• Human labeling is hard
• Synthetic label generators
Challenges with real datasets
Tools for Synthetic Data Generation
14
Proprietary Tools
Company Core Technology
Tonic.ai
All-in-one platform for data anonymization, subsetting, and synthesis
integrated with databases (hadoop, oracle, mysql, MS sql server, mongo
db, amazon aurora/redshift, and google big query)
- Uses Condenser and Masquerade
Mostly.ai
Tablular data using generative deep neural networks (no image data)
CVEDIA
- Sensor modeling and algorithm training
- Handle image using SynCity as a custom pocket laboratory to generate
highly entropic scenes, conditions, and metadata. Enable real-time
Hardware-In-the-Loop (HWIL), Human-In-the-Loop (HITL) or Software-In-
the-Loop (SIL) simulations even with complex sensor configurations
Deep vision data
Image creation
Synthetic training data
Synthesis.ai The data generation platform for computer vision
15
López de Prado, Marcos, Machine Learning for Asset Managers,
Cambridge University Press 2020
16
Opensource tools
17
SDV
https://www.computer.org/csdl/proceedings-
article/dsaa/2016/07796926/12OmNwx3Q7S
18
Data Synthesizer
https://faculty.washington.edu/billhowe/publications/pdfs/pin
g17datasynthesizer.pdf
19
Synthpop
20
VAE
https://arxiv.org/pdf/1808.06444.pdf
21
GAN
https://developers.google.com/machine-
learning/gan/gan_structure
22
WGAN
23
24
25
26
Synthetic data in finance
28
29
Demo 1 – Loan Data Synthesizer
30
Demo 2: Synthetic Sales data generation
31
Demo 3 : Synthetic VIX generation
32
If you want to be a part of QuSandbox private Beta
Contact us:
info@qusandbox
Sri Krishnamurthy, CFA, CAP
Founder and Chief Data Scientist
sri@quantuniversity.com
srikrishnamurthy
www.qu.academy
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
33

More Related Content

PDF
Synthetic data in finance
PDF
Rapid prototyping quant research ml models using the qu sandbox
PDF
Qu speaker series 9
PDF
Quant university MRM and machine learning
PDF
Frontiers in Alternative Data : Techniques and Use Cases
PDF
Ml and AI for financial professionals
PDF
ML and AI in Finance: Master Class
PDF
Machine learning for factor investing
Synthetic data in finance
Rapid prototyping quant research ml models using the qu sandbox
Qu speaker series 9
Quant university MRM and machine learning
Frontiers in Alternative Data : Techniques and Use Cases
Ml and AI for financial professionals
ML and AI in Finance: Master Class
Machine learning for factor investing

What's hot (20)

PDF
Synthetic data generation for machine learning
PDF
Machine Learning and AI: An Intuitive Introduction - CFA Institute Masterclass
PDF
Ml master class
PDF
achine Learning and Model Risk
PDF
QuantUniversity Machine Learning in Finance Course
PDF
Qu speaker series 14: Synthetic Data Generation in Finance
PPTX
Building Data Science Pipelines in Python using Luigi
PDF
Data Science Pipelines in Python using Luigi
PDF
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
PDF
Ml master class northeastern university
PDF
CFA-NY Workshop - Final slides
PDF
Machine Learning in Finance: 10 Things You Need to Know in 2021
PDF
Careers in analytics
PDF
10 Key Considerations for AI/ML Model Governance
PDF
Qu speaker series:Ethical Use of AI in Financial Markets
PDF
NLP in Finance
PDF
Machine Learning and AI in Risk Management
PDF
QuantUniversity Fintech Bootcamp Day- 4
PDF
AI Explainability and Model Risk Management
PDF
Ml conference slides
Synthetic data generation for machine learning
Machine Learning and AI: An Intuitive Introduction - CFA Institute Masterclass
Ml master class
achine Learning and Model Risk
QuantUniversity Machine Learning in Finance Course
Qu speaker series 14: Synthetic Data Generation in Finance
Building Data Science Pipelines in Python using Luigi
Data Science Pipelines in Python using Luigi
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Ml master class northeastern university
CFA-NY Workshop - Final slides
Machine Learning in Finance: 10 Things You Need to Know in 2021
Careers in analytics
10 Key Considerations for AI/ML Model Governance
Qu speaker series:Ethical Use of AI in Financial Markets
NLP in Finance
Machine Learning and AI in Risk Management
QuantUniversity Fintech Bootcamp Day- 4
AI Explainability and Model Risk Management
Ml conference slides
Ad

Similar to Synthetic data in finance (20)

PDF
Ml master class cfa poland
PDF
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PPTX
Counting the World with AI Models
PDF
Qu for India - QuantUniversity FundRaiser
PDF
OpenPOWER/POWER9 AI webinar
PDF
influence of AI in IS
PDF
Introduction to Data Science - Fundamentals
PDF
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
PPT
Intelligent Big Data analytics for the future.
PDF
IBM i & Data Science in the AI era.
PPTX
IBM Meetup on November 1, 2018: Machine Learning made easy with Watson Studio
PDF
Machine Learning for Finance Master Class
PDF
QuSandbox+NVIDIA Rapids
PPTX
Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15
PDF
Data Science at Scale - The DevOps Approach
PPTX
Scaling Up Presentation
PDF
How to Consume Your Data for AI
PPTX
Getting to timely insights - how to make it happen?
PDF
CWIN17 san francisco-ai implementation-pub
PPSX
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Ml master class cfa poland
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
Counting the World with AI Models
Qu for India - QuantUniversity FundRaiser
OpenPOWER/POWER9 AI webinar
influence of AI in IS
Introduction to Data Science - Fundamentals
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
Intelligent Big Data analytics for the future.
IBM i & Data Science in the AI era.
IBM Meetup on November 1, 2018: Machine Learning made easy with Watson Studio
Machine Learning for Finance Master Class
QuSandbox+NVIDIA Rapids
Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15
Data Science at Scale - The DevOps Approach
Scaling Up Presentation
How to Consume Your Data for AI
Getting to timely insights - how to make it happen?
CWIN17 san francisco-ai implementation-pub
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Ad

More from QuantUniversity (19)

PDF
AI in Finance and Retirement Systems: Insights from the EBRI-Milken Institute...
PDF
Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitig...
PDF
EU Artificial Intelligence Act 2024 passed !
PDF
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
PDF
Ml master class for CFA Dallas
PDF
Algorithmic auditing 1.0
PDF
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
PDF
Seeing what a gan cannot generate: paper review
PDF
Algorithmic auditing 1.0
PDF
Bayesian Portfolio Allocation
PDF
The API Jungle
PDF
Explainable AI Workshop
PDF
Constructing Private Asset Benchmarks
PDF
Machine Learning Interpretability
PDF
Responsible AI in Action
PDF
Qwafafew meeting 5
PDF
Fintech in the Post-Covid Age
PDF
Master Class: GANS with Applications in Synthetic Data Generation
PDF
Qwafafew meeting 4
AI in Finance and Retirement Systems: Insights from the EBRI-Milken Institute...
Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitig...
EU Artificial Intelligence Act 2024 passed !
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Ml master class for CFA Dallas
Algorithmic auditing 1.0
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Seeing what a gan cannot generate: paper review
Algorithmic auditing 1.0
Bayesian Portfolio Allocation
The API Jungle
Explainable AI Workshop
Constructing Private Asset Benchmarks
Machine Learning Interpretability
Responsible AI in Action
Qwafafew meeting 5
Fintech in the Post-Covid Age
Master Class: GANS with Applications in Synthetic Data Generation
Qwafafew meeting 4

Recently uploaded (20)

PPTX
Cell Types and Its function , kingdom of life
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
master seminar digital applications in india
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Computing-Curriculum for Schools in Ghana
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Presentation on HIE in infants and its manifestations
PPTX
Institutional Correction lecture only . . .
PPTX
GDM (1) (1).pptx small presentation for students
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
A systematic review of self-coping strategies used by university students to ...
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Complications of Minimal Access Surgery at WLH
Cell Types and Its function , kingdom of life
Microbial diseases, their pathogenesis and prophylaxis
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
master seminar digital applications in india
Supply Chain Operations Speaking Notes -ICLT Program
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Computing-Curriculum for Schools in Ghana
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
VCE English Exam - Section C Student Revision Booklet
Anesthesia in Laparoscopic Surgery in India
Presentation on HIE in infants and its manifestations
Institutional Correction lecture only . . .
GDM (1) (1).pptx small presentation for students
2.FourierTransform-ShortQuestionswithAnswers.pdf
human mycosis Human fungal infections are called human mycosis..pptx
A systematic review of self-coping strategies used by university students to ...
STATICS OF THE RIGID BODIES Hibbelers.pdf
Complications of Minimal Access Surgery at WLH

Synthetic data in finance