SlideShare a Scribd company logo
Qu Speaker Series
Synthetic Data Generation in Finance
2020 Copyright QuantUniversity LLC.
Hosted By:
Sri Krishnamurthy, CFA, CAP
sri@quantuniversity.com
www.qu.academy
11/18/2020
Online
https://quspeakerseries14.splashthat.com/
2
QuantUniversity
• Boston-based Data Science, Quant
Finance and Machine Learning
training and consulting advisory
• Trained more than 1000 students in
Quantitative methods, Data Science
and Big Data Technologies using
MATLAB, Python and R
• Building a platform for AI
and Machine Learning Exploration
and Experimentation
3
For registration information, go to
https://QuFallSchool.splashthat.com
4
https://Quwinterschool.splashthat.com
5
6
7
Speakers
Stefan is the founder and CEO of Applied AI. He advises Fortune 500 companies, investment
firms, and startups across industries on data & AI strategy, building data science teams, and
developing end-to-end machine learning solutions.
Before his current venture, he was a partner and managing director at an international
investment firm, where he built the predictive analytics and investment research practice.
He was also a senior executive at a global fintech company with operations in 15 markets,
advised Central Banks in emerging markets, and consulted for the World Bank.
He holds Master's degrees in Computer Science from Georgia Tech and in Economics from
Harvard and Free University Berlin, and is a CFA Charterholder.
SYNTHETIC DATA GENERATION IN
FINANCE WITH TIMEGAN & TENSORFLOW
STEFAN JANSEN
SYNTHETIC DATA FOR FINANCE
AGENDA
▸ What are Generative Adversarial Networks (GANs)?

▸ Why could synthetic data for finance be useful?

▸ How can we build a TimeGAN using TensorFlow 2?

▸ How should we evaluate the quality of the results?
GANs are the most interesting
idea in ML in the last 10 years.

Yann LeCun
SYNTHETIC DATA FOR FINANCE - INTRO
WHAT ARE GENERATIVE ADVERSARIAL NETWORKS?
▸ Ian Goodfellow, et al. (NeurIPS 2014): learn a generative model via adversarial
process that simultaneously trains two models (~minimax 2-player game)
SYNTHETIC DATA FOR FINANCE - INTRO
WHAT ARE GENERATIVE ADVERSARIAL NETWORKS?
▸ Wave of research, over 500 different GAN architectures by 2018

▸ From fake celebrities to style transfer, music generation and more

▸ Medical Time Series: Recurrent (Conditional) GAN - Esteban, et al. (2017)

▸ Generate real-valued time-series data as from an Intensive Care Unit

▸ RNN in generator and the discriminator, both conditioned on auxiliary
information about the state of the patient

▸ Train early warning system on synthetic data, test on real data => only minor
degradation
SYNTHETIC DATA FOR FINANCE - MOTIVATION
WHY SYNTHETIC DATA FOR FINANCE?
▸ Financial data is relatively scarce (compared to web-scale images etc) and does
not grow at the same speed as sources in other domains

▸ Limited training data availability increases the risk of model and backtest
overfitting

▸ But how can we generate data that reflects the temporal dynamic of financial
market time series?
SYNTHETIC DATA FOR FINANCE - TIMEGAN
TIME-SERIES GAN - JINSUNG YOON ET AL. (GOOGLE / CAMBRIDGE), NEURIPS 2019
▸ Goal: create a good generative model that preserves temporal dynamics so that new
sequences respect the relationships 

▸ between variables

▸ across time

▸ Generate realistic data by combining “the flexibility of the unsupervised paradigm with
the control afforded by supervised training”. 

▸ Approach: learn an embedding space (think word2vec), optimized with both supervised
and adversarial objectives to encourage the network to adhere to the dynamics of the
training data during sampling (see paper for the math…)
SYNTHETIC DATA FOR FINANCE - TIMEGAN
TIMEGAN ARCHITECTURE: 2 NETWORKS, 4 COMPONENTS, 3 LOSS FUNCTIONS
1. Unsupervised adversarial loss
on real and synthetic sequences

2. Supervised loss (minimized by
training both the embedding
and generator networks)
captures the stepwise
conditional distributions in the
data. 

3. The embedding network
reduces the dimensionality of
the adversarial learning space,
assuming that temporal
dynamics are driven by fewer,
lower-dimensional factors.
SYNTHETIC DATA FOR FINANCE - TIMEGAN
TIME-SERIES GAN - LET’S TAKE A LOOK AT THE CODE!
▸ Port original implementation from TF1 to TF2, attempt to simplify

▸ Minor changes: instead of OHLCV, use six different daily closing prices

▸ https://github.com/stefan-jansen/synthetic-data-for-finance
SYNTHETIC DATA FOR FINANCE - RESULTS EVALUATION
EVALUATING THE QUALITY OF SYNTHETIC TIME-SERIES DATA
The TimeGAN authors use three practical criteria to assess the generated data:

1. Diversity: the distribution of the synthetic samples should roughly match that
of the real data

2. Fidelity: the sample series should be indistinguishable from the real data,
and 

3. Usefulness: the synthetic data should be as useful as their real counterparts
for solving a predictive task
SYNTHETIC DATA FOR FINANCE - RESULTS EVALUATION
ASSESSING DIVERSITY: VISUALIZATION USING PCA AND T-SNE
SYNTHETIC DATA FOR FINANCE - RESULTS EVALUATION
ASSESSING FIDELITY: TIME SERIES CLASSIFICATION PERFORMANCE
SYNTHETIC DATA FOR FINANCE - RESULTS EVALUATION
ASSESSING USEFULNESS: TRAIN ON SYNTHETIC, TEST ON REAL
SYNTHETIC DATA FOR FINANCE - CONCLUSION
KEY TAKEAWAYS
‣ Using a small dataset, TimeGAN creates synthetic data that, to some extent,
mimic actual stock price series.

‣ While far from conclusive that we can create artificial data that is in fact useful for
model training and backtesting at scale, it’s a promising first step.

‣ There are numerous avenues to build on this architecture by refining the
configuration and the training process, besides using more (diverse) data.
8
Next Week
9
Demos, slides and video available on QuAcademy
Go to www.qu.academy
9
10
Instructions for the Lab:
1. Go to https://academy.qusandbox.com/#/register and register using the code:
"QUFALLSCHOOL"
Thank you!
Sri Krishnamurthy, CFA, CAP
Founder and CEO
QuantUniversity LLC.
srikrishnamurthy
www.QuantUniversity.com
Contact
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
11

More Related Content

PDF
Digital 2016 Indonesia (January 2016)
PDF
Synthetic data generation for machine learning
PDF
Synthetic data in finance
PDF
Synthetic data in finance
PDF
Deep Learning Applications in Finance.pdf
PDF
Synthetic Data Generation with DoppelGanger
PDF
Nvidia+s32049+yigal jhirad+the application of generative adversarial networks...
PPTX
UNSUPERVISED NEURAL.pptx UNSUPERVISED PPT
Digital 2016 Indonesia (January 2016)
Synthetic data generation for machine learning
Synthetic data in finance
Synthetic data in finance
Deep Learning Applications in Finance.pdf
Synthetic Data Generation with DoppelGanger
Nvidia+s32049+yigal jhirad+the application of generative adversarial networks...
UNSUPERVISED NEURAL.pptx UNSUPERVISED PPT

Similar to Qu speaker series 14: Synthetic Data Generation in Finance (20)

PDF
Applications of GANs in Finance
PDF
Generating Realistic Synthetic Data in Finance
PPTX
Image analysis - performance analysis - gans -
PPTX
GANs Presentation.pptx
PPTX
Intro_to_GANSdssfdfe fefewfewfew fief we .pptx
PDF
What is a GAN Generative Adversarial Networks Guide.pdf
PPTX
Gans - Generative Adversarial Nets
PDF
What is a GAN Generative Adversarial Networks Guide.pdf
PDF
What is a GAN Generative Adversarial Networks Guide.pdf
PDF
PPTX
Module4_GAN.pptxgdgdijehejejjejejejhehjdd
PDF
EMBRACING THE REVOLUTION: GENERATIVE AI AND SYNTHETIC DATA’S IMPACT ON FINANCE
PDF
The age of GANs
PPTX
Financial forecastings using neural networks ppt
PDF
Unit 4 Deep Generative Models Unit 4 Deep Generative Model
PDF
The Rise of Generative AI in Finance: Reshaping the Industry with Synthetic Data
PPTX
Generative advesarial networks technical seminar
PPTX
Self Play Networks and Generative Artificial Networks
PPTX
Purple and Violet Modern Marketing Presentation (1).pptx
PPTX
GAN Deep. Learning architecture and applications.pptx
Applications of GANs in Finance
Generating Realistic Synthetic Data in Finance
Image analysis - performance analysis - gans -
GANs Presentation.pptx
Intro_to_GANSdssfdfe fefewfewfew fief we .pptx
What is a GAN Generative Adversarial Networks Guide.pdf
Gans - Generative Adversarial Nets
What is a GAN Generative Adversarial Networks Guide.pdf
What is a GAN Generative Adversarial Networks Guide.pdf
Module4_GAN.pptxgdgdijehejejjejejejhehjdd
EMBRACING THE REVOLUTION: GENERATIVE AI AND SYNTHETIC DATA’S IMPACT ON FINANCE
The age of GANs
Financial forecastings using neural networks ppt
Unit 4 Deep Generative Models Unit 4 Deep Generative Model
The Rise of Generative AI in Finance: Reshaping the Industry with Synthetic Data
Generative advesarial networks technical seminar
Self Play Networks and Generative Artificial Networks
Purple and Violet Modern Marketing Presentation (1).pptx
GAN Deep. Learning architecture and applications.pptx
Ad

More from QuantUniversity (20)

PDF
AI in Finance and Retirement Systems: Insights from the EBRI-Milken Institute...
PDF
Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitig...
PDF
EU Artificial Intelligence Act 2024 passed !
PDF
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
PDF
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PDF
Qu for India - QuantUniversity FundRaiser
PDF
Ml master class for CFA Dallas
PDF
Algorithmic auditing 1.0
PDF
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
PDF
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
PDF
Seeing what a gan cannot generate: paper review
PDF
AI Explainability and Model Risk Management
PDF
Algorithmic auditing 1.0
PDF
Machine Learning in Finance: 10 Things You Need to Know in 2021
PDF
Bayesian Portfolio Allocation
PDF
The API Jungle
PDF
Explainable AI Workshop
PDF
Constructing Private Asset Benchmarks
PDF
Machine Learning Interpretability
PDF
Responsible AI in Action
AI in Finance and Retirement Systems: Insights from the EBRI-Milken Institute...
Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitig...
EU Artificial Intelligence Act 2024 passed !
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
Qu for India - QuantUniversity FundRaiser
Ml master class for CFA Dallas
Algorithmic auditing 1.0
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Seeing what a gan cannot generate: paper review
AI Explainability and Model Risk Management
Algorithmic auditing 1.0
Machine Learning in Finance: 10 Things You Need to Know in 2021
Bayesian Portfolio Allocation
The API Jungle
Explainable AI Workshop
Constructing Private Asset Benchmarks
Machine Learning Interpretability
Responsible AI in Action
Ad

Recently uploaded (20)

PPTX
Leprosy and NLEP programme community medicine
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
Modelling in Business Intelligence , information system
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
[EN] Industrial Machine Downtime Prediction
PDF
Introduction to the R Programming Language
PDF
Mega Projects Data Mega Projects Data
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PDF
Transcultural that can help you someday.
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
modul_python (1).pptx for professional and student
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Managing Community Partner Relationships
PDF
Introduction to Data Science and Data Analysis
PPTX
Qualitative Qantitative and Mixed Methods.pptx
Leprosy and NLEP programme community medicine
Database Infoormation System (DBIS).pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Topic 5 Presentation 5 Lesson 5 Corporate Fin
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
Modelling in Business Intelligence , information system
SAP 2 completion done . PRESENTATION.pptx
[EN] Industrial Machine Downtime Prediction
Introduction to the R Programming Language
Mega Projects Data Mega Projects Data
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Transcultural that can help you someday.
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
modul_python (1).pptx for professional and student
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
IBA_Chapter_11_Slides_Final_Accessible.pptx
Managing Community Partner Relationships
Introduction to Data Science and Data Analysis
Qualitative Qantitative and Mixed Methods.pptx

Qu speaker series 14: Synthetic Data Generation in Finance

  • 1. Qu Speaker Series Synthetic Data Generation in Finance 2020 Copyright QuantUniversity LLC. Hosted By: Sri Krishnamurthy, CFA, CAP sri@quantuniversity.com www.qu.academy 11/18/2020 Online https://quspeakerseries14.splashthat.com/
  • 2. 2 QuantUniversity • Boston-based Data Science, Quant Finance and Machine Learning training and consulting advisory • Trained more than 1000 students in Quantitative methods, Data Science and Big Data Technologies using MATLAB, Python and R • Building a platform for AI and Machine Learning Exploration and Experimentation
  • 3. 3 For registration information, go to https://QuFallSchool.splashthat.com
  • 5. 5
  • 6. 6
  • 7. 7 Speakers Stefan is the founder and CEO of Applied AI. He advises Fortune 500 companies, investment firms, and startups across industries on data & AI strategy, building data science teams, and developing end-to-end machine learning solutions. Before his current venture, he was a partner and managing director at an international investment firm, where he built the predictive analytics and investment research practice. He was also a senior executive at a global fintech company with operations in 15 markets, advised Central Banks in emerging markets, and consulted for the World Bank. He holds Master's degrees in Computer Science from Georgia Tech and in Economics from Harvard and Free University Berlin, and is a CFA Charterholder.
  • 8. SYNTHETIC DATA GENERATION IN FINANCE WITH TIMEGAN & TENSORFLOW STEFAN JANSEN
  • 9. SYNTHETIC DATA FOR FINANCE AGENDA ▸ What are Generative Adversarial Networks (GANs)? ▸ Why could synthetic data for finance be useful? ▸ How can we build a TimeGAN using TensorFlow 2? ▸ How should we evaluate the quality of the results? GANs are the most interesting idea in ML in the last 10 years. Yann LeCun
  • 10. SYNTHETIC DATA FOR FINANCE - INTRO WHAT ARE GENERATIVE ADVERSARIAL NETWORKS? ▸ Ian Goodfellow, et al. (NeurIPS 2014): learn a generative model via adversarial process that simultaneously trains two models (~minimax 2-player game)
  • 11. SYNTHETIC DATA FOR FINANCE - INTRO WHAT ARE GENERATIVE ADVERSARIAL NETWORKS? ▸ Wave of research, over 500 different GAN architectures by 2018 ▸ From fake celebrities to style transfer, music generation and more ▸ Medical Time Series: Recurrent (Conditional) GAN - Esteban, et al. (2017) ▸ Generate real-valued time-series data as from an Intensive Care Unit ▸ RNN in generator and the discriminator, both conditioned on auxiliary information about the state of the patient ▸ Train early warning system on synthetic data, test on real data => only minor degradation
  • 12. SYNTHETIC DATA FOR FINANCE - MOTIVATION WHY SYNTHETIC DATA FOR FINANCE? ▸ Financial data is relatively scarce (compared to web-scale images etc) and does not grow at the same speed as sources in other domains ▸ Limited training data availability increases the risk of model and backtest overfitting ▸ But how can we generate data that reflects the temporal dynamic of financial market time series?
  • 13. SYNTHETIC DATA FOR FINANCE - TIMEGAN TIME-SERIES GAN - JINSUNG YOON ET AL. (GOOGLE / CAMBRIDGE), NEURIPS 2019 ▸ Goal: create a good generative model that preserves temporal dynamics so that new sequences respect the relationships ▸ between variables ▸ across time ▸ Generate realistic data by combining “the flexibility of the unsupervised paradigm with the control afforded by supervised training”. ▸ Approach: learn an embedding space (think word2vec), optimized with both supervised and adversarial objectives to encourage the network to adhere to the dynamics of the training data during sampling (see paper for the math…)
  • 14. SYNTHETIC DATA FOR FINANCE - TIMEGAN TIMEGAN ARCHITECTURE: 2 NETWORKS, 4 COMPONENTS, 3 LOSS FUNCTIONS 1. Unsupervised adversarial loss on real and synthetic sequences 2. Supervised loss (minimized by training both the embedding and generator networks) captures the stepwise conditional distributions in the data. 3. The embedding network reduces the dimensionality of the adversarial learning space, assuming that temporal dynamics are driven by fewer, lower-dimensional factors.
  • 15. SYNTHETIC DATA FOR FINANCE - TIMEGAN TIME-SERIES GAN - LET’S TAKE A LOOK AT THE CODE! ▸ Port original implementation from TF1 to TF2, attempt to simplify ▸ Minor changes: instead of OHLCV, use six different daily closing prices ▸ https://github.com/stefan-jansen/synthetic-data-for-finance
  • 16. SYNTHETIC DATA FOR FINANCE - RESULTS EVALUATION EVALUATING THE QUALITY OF SYNTHETIC TIME-SERIES DATA The TimeGAN authors use three practical criteria to assess the generated data: 1. Diversity: the distribution of the synthetic samples should roughly match that of the real data 2. Fidelity: the sample series should be indistinguishable from the real data, and  3. Usefulness: the synthetic data should be as useful as their real counterparts for solving a predictive task
  • 17. SYNTHETIC DATA FOR FINANCE - RESULTS EVALUATION ASSESSING DIVERSITY: VISUALIZATION USING PCA AND T-SNE
  • 18. SYNTHETIC DATA FOR FINANCE - RESULTS EVALUATION ASSESSING FIDELITY: TIME SERIES CLASSIFICATION PERFORMANCE
  • 19. SYNTHETIC DATA FOR FINANCE - RESULTS EVALUATION ASSESSING USEFULNESS: TRAIN ON SYNTHETIC, TEST ON REAL
  • 20. SYNTHETIC DATA FOR FINANCE - CONCLUSION KEY TAKEAWAYS ‣ Using a small dataset, TimeGAN creates synthetic data that, to some extent, mimic actual stock price series. ‣ While far from conclusive that we can create artificial data that is in fact useful for model training and backtesting at scale, it’s a promising first step. ‣ There are numerous avenues to build on this architecture by refining the configuration and the training process, besides using more (diverse) data.
  • 22. 9 Demos, slides and video available on QuAcademy Go to www.qu.academy 9
  • 23. 10 Instructions for the Lab: 1. Go to https://academy.qusandbox.com/#/register and register using the code: "QUFALLSCHOOL"
  • 24. Thank you! Sri Krishnamurthy, CFA, CAP Founder and CEO QuantUniversity LLC. srikrishnamurthy www.QuantUniversity.com Contact Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be distributed or used in any other publication without the prior written consent of QuantUniversity LLC. 11