Implementation of Real Data
for Financial Market Simulation
using Clustering, Deep Learning, and
Artificial Financial Market
Masanori HIRANO1, Hiroyasu MATSUSHIMA2,
Kiyoshi IZUMI1, and Hiroki SAKAJI1
1 School of Engineering, The University of Tokyo
2 Center for Data Science Education and Research, Shiga University
hirano@g.ecc.u-tokyo.ac.jp
https://mhirano.jp/
Motivation
• Instability in Financial Markets
• 2008 financial crisis
• Flush Crush
• Price fluctuation by COVID-19
• Regulations are necessary
• New regulations like Basel III
• Can avoid abovementioned crisis?
• Difficulties in Financial markets
• nonstationary
• Rare phenomena happen frequently
• => Simulation is good solution, but not trustable.
• Find what’s the matter
• Dealing with trustability with actual data
©M.HIRANO & Izumi Lab.
DJIA on May 6, 2010 (Flush crush)
DJIA in 2020 (COVID-19)
11/19/20 2
Last year: PRIMA 2019
• We showed the difference between simulation & data
• => simulation model can overpass key features.
©M.HIRANO & Izumi Lab.
Missing feature in
simulation
11/19/20 3
Our work
• Proposed a new model built using actual data
• Comparing in Simulation
Traditional model  Our new model w/ data
• Only focus on HFT-MM  Specific trader & strategy
• Target: Tokyo Stock Exchange
• We analyzed a special data
provided by JPX
11/19/20 ©M.HIRANO & Izumi Lab. 4
Tokyo Stock Exchange
What’s the HFT-MM?
• High-Frequency-Trader Market-Making strategy
• Market-making strategy:
• (Basically) order near the best price
• Get profit by the spread (1001-999=2)
• Do repeatedly
• Risk-hedge by high-frequency-trade:
• Always have price move risk (Price move >> spread)
• Do action faster & hedge risk by setting off their inventory
• => These features are easy to recognize in data
11/19/20 ©M.HIRANO & Izumi Lab. 5
Sell
Buy
Data Extraction
We need HFT-MM ordering data…
©M.HIRANO & Izumi Lab.11/19/20 6
Data
• “Order-book reproduction data”
provided by Japan Exchange Group (JPX)
• Containing masked trader information
<- Called “Virtual Server (VS)”
11/19/20 ©M.HIRANO & Izumi Lab. 7
Time Ticker Kind Buy/sell VS Price
11:11:50.702813 A Limit Order sell VS1 2570
11:11:50.703600 B Executed buy VS4 Market Order
11:11:50.704001 A Cancel sell VS1 2570
Sample
Some columns are not shown such as volume
Indices for clustering (extracting HFT-MM)
• The logarithm of action per ticker
ActionsPerTicker =
newOrders + changeOrders + (cancelOrders)
(numTicker)
ActionsPerTickerLOG = ln ActionsPerTicker
• Inventory Ratio
InventoryRatioABS
= 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝑛𝑛ticker
soldVolume ticker − boughtVolume ticker
soldVolume ticker + boughtVolume ticker
• Executed order ratio
• Cancel order ratio
• Market order ration
• The logarithm of ticker per VS
TickerPerVSLOG = ln
(numTicker)
(numVS)
11/19/20 ©M.HIRANO & Izumi Lab. 8
Many order
Low inventory
Many VS usage
Low executed ratio
High cancel ratio
Low market order ratio
Data outline
• Jan. 2015 – mid-Sep. 2015: All 178 business days
• All traders: 2654 traders
• Only HFT: 181 traders <= based on ActionsPerTicker ≥ 1000
• Hierarchical Clustering for HFT-MM
11/19/20 ©M.HIRANO & Izumi Lab. 9
Hierarchical Clustering [Uno et al. 18]
• Normalizations for each indices & clustering
• Euclidean distance
• Ward’s method
• 10 clusters
11/19/20 ©M.HIRANO & Izumi Lab. 10
HFT-MM cluster based on indices
Data split
• We got ordering data of HFT-MM
• 2015/01-07 => model training of HFT-MM
• 2015/08 => evaluation of simulation
11/19/20 ©M.HIRANO & Izumi Lab. 11
Simulation & Models
©M.HIRANO & Izumi Lab.11/19/20 12
Simulation outline
• We used “PlhamJ” as a simulation platform.
PlhamJ: Platform for Large-scale and High-frequency Artificial Market (Java version)
11/19/20 ©M.HIRANO & Izumi Lab. 13
Simulation setting
• 1,000 stylized traders + 1 traditional HFT-MM trader
vs
• 1,000 stylized traders + 1 ML HFT-MM trader (new)
• Comparison between behaviors of -
• 1 traditional HFT-MM trader in simulation
• 1 ML HFT-MM trader (new) in simulation
• Real data (out of learning data)
©M.HIRANO & Izumi Lab.11/19/20 14
Stylized Trader Agents [Chiarella et al. 02]
• Logarithmic return prediction for bid/ask price
𝑟𝑟 =
1
𝑤𝑤𝐹𝐹+𝑤𝑤𝐶𝐶+𝑤𝑤 𝑁𝑁
𝑤𝑤𝐹𝐹 ⋅ 𝐹𝐹 + 𝑤𝑤𝐶𝐶 ⋅ 𝐶𝐶 + 𝑤𝑤𝑁𝑁 ⋅ 𝑁𝑁
• Fundamentals
𝐹𝐹 =
1
mean reversion time
ln
current market price
current fundamental price
• Chartist (trend)
𝐶𝐶 = logarithm averaged return in the past
• Noise 𝑁𝑁 ~ 𝑁𝑁 0, 𝜎𝜎𝑁𝑁
• + margin => decide price
• Every 100 step they make a buy or sell order
11/19/20 ©M.HIRANO & Izumi Lab. 15
Traditional HFT-MM Trader [Avellaneda et al. 02]
• Trader’s price interval:
𝛾𝛾𝑖𝑖�𝜎𝜎𝑖𝑖
2
+
2
𝛾𝛾𝑖𝑖
ln 1 +
𝛾𝛾𝑖𝑖
𝑘𝑘
• Trader’s mid-price
𝑝𝑝𝑡𝑡
∗
− 𝛾𝛾𝑖𝑖�𝜎𝜎𝑖𝑖
2
𝑞𝑞𝑡𝑡
𝑖𝑖
• Note:
𝛾𝛾𝑖𝑖: risk-hedge level
�𝜎𝜎𝑖𝑖: SD in price
𝑘𝑘: a parameter for order arrival
𝑝𝑝𝑡𝑡
∗
: fundamental price
𝑞𝑞𝑡𝑡
𝑖𝑖
:inventory
11/19/20 ©M.HIRANO & Izumi Lab. 16
Price
Sell
Buy
Fundamental Price
Trader’s mid-price
Trader’s price interval
HFT-MM Machine Learned Model
• Using machine learning for data, we build a model
• Model predict the next action of traders
©M.HIRANO & Izumi Lab.11/19/20 17
Results
11/19/20 ©M.HIRANO & Izumi Lab. 18
Comparison
11/19/20 ©M.HIRANO & Izumi Lab. 19
Ticks between the best price and ordering of HFT-MM
Comparison in KL Divergence
• Our new ML model outperform traditional model
marginally…
• Why so big variance? =>
11/19/20 ©M.HIRANO & Izumi Lab. 20
Distribution of 𝐷𝐷𝐾𝐾𝐾𝐾 of our new model
Q P Mean SD
Actual Traditional 0.730009 0.119884
Actual ML 0.648459 0.957854
Comparison w/ omission
• Error case: easy to detect => omit them
• => the omission give us strong results
11/19/20 ©M.HIRANO & Izumi Lab. 21
Q P Mean SD
Actual Traditional 0.730009 0.119884
Actual ML (w/ omissions) 0.186192 0.085099
Discussion & Conclusion
• Our new model show the strong result w/ omission
• Reveal the needs & benefits of real data usage
• But, we should deal with non-robustness of ML model
11/19/20 ©M.HIRANO & Izumi Lab. 22
Future work
• More robust ML model
• Model building with data for all trader

More Related Content

PPTX
2019/10/31 PRIMA2019: Comparison of Behaviors of Actual and Simulated HFT Tra...
PDF
2020/06/08 JSAI2020: STBM: Stochastic Trading Behavior Model for Financial Ma...
PDF
2018/10/30 PRIMA Workshop 2018: Impact Assessments of the CAR Regulation usin...
PDF
Affecting Market Efficiency by Increasing Speed of Order Matching Systems on ...
PDF
2022/06/15 JSAI2022: Data-driven Agent Design for Artificial Market Simulation
PDF
2022/05/05 CIFEr2022: Concept and Practice of Artificial Market Data Mining P...
PDF
Investigation of Frequent Batch Auctions using Agent Based Model
PDF
How Many Orders does a Spoofer Need? - Investigation by Agent-Based Model -
2019/10/31 PRIMA2019: Comparison of Behaviors of Actual and Simulated HFT Tra...
2020/06/08 JSAI2020: STBM: Stochastic Trading Behavior Model for Financial Ma...
2018/10/30 PRIMA Workshop 2018: Impact Assessments of the CAR Regulation usin...
Affecting Market Efficiency by Increasing Speed of Order Matching Systems on ...
2022/06/15 JSAI2022: Data-driven Agent Design for Artificial Market Simulation
2022/05/05 CIFEr2022: Concept and Practice of Artificial Market Data Mining P...
Investigation of Frequent Batch Auctions using Agent Based Model
How Many Orders does a Spoofer Need? - Investigation by Agent-Based Model -

Similar to 2020/11/19 PRIMA2020: Implementation of Real Data for Financial Market Simulation using Clustering, Deep Learning, and Artificial Financial Market (20)

PPTX
Machine Learning trading bots
PDF
Why do Active Funds that Trade Infrequently Make a Market more Efficient? --...
PPTX
2022/11/17 PRIMA2022: Does Order Simultaneity Affect the Data Mining Task in ...
PDF
Summary jpx wp_en_no9
PPT
Tsl version 1.1_review
PDF
RoboDuck — Automated Trading Robot
PDF
Quant congressusa2011algotradinglast
PDF
Algorithmic and high-frequency_trading 2011
PPTX
EXTENT-2015: Prognoz Market Surveillance
PDF
Cifer2017
PPTX
Algorithmic & High-Frequency Trading
PPTX
STOCK MARKET (1)
PDF
Why do Active Funds that Trade Infrequently Make a Market more Efficient? -- ...
PDF
2022CIFEr
PDF
Online Learning Startegy of MArket MAking.pdf
PDF
Algo trading(Minor Project) strategy EMA with Ipython
PDF
High-Frequency Trading and 2010 Flash Crash
PPT
Hidden Treasure of High Frequency Dynamics
PPT
HTHFD
PDF
How to survive in a High Frequency World
Machine Learning trading bots
Why do Active Funds that Trade Infrequently Make a Market more Efficient? --...
2022/11/17 PRIMA2022: Does Order Simultaneity Affect the Data Mining Task in ...
Summary jpx wp_en_no9
Tsl version 1.1_review
RoboDuck — Automated Trading Robot
Quant congressusa2011algotradinglast
Algorithmic and high-frequency_trading 2011
EXTENT-2015: Prognoz Market Surveillance
Cifer2017
Algorithmic & High-Frequency Trading
STOCK MARKET (1)
Why do Active Funds that Trade Infrequently Make a Market more Efficient? -- ...
2022CIFEr
Online Learning Startegy of MArket MAking.pdf
Algo trading(Minor Project) strategy EMA with Ipython
High-Frequency Trading and 2010 Flash Crash
Hidden Treasure of High Frequency Dynamics
HTHFD
How to survive in a High Frequency World
Ad

More from Masanori HIRANO (9)

PPTX
2023/03/04 sigfin30: 原資産価格過程不要な敵対的Deep Hedging
PDF
2023/03/04 sigfin30 PR: Special Session on Applied Informatics in Finance and...
PPTX
2022/11/17 PRIMA2022: Analysis of Carbon Neutrality Scenarios of Industrial C...
PDF
2022/10/30 BESC2022: Parameter Tuning Method for Multi-agent Simulation using...
PDF
2022/03/12 sigfin28: オプションによるオプションのヘッジを可能にする二重 Deep Hedging 機構
PDF
2020/11/19 PRIMA2020: Simulation of Unintentional Collusion Caused by Auto Pr...
PDF
2020/03/18 NLP2020: 金融文書のための別タスク学習による教師なし重要文判定
PDF
2018/11/17 ICDMW 2018: Selection of Related Stocks using Financial Text Mining
PDF
2018/06/06 JSAI2018 Effects Analysis of CAR Regulations on Financial Markets ...
2023/03/04 sigfin30: 原資産価格過程不要な敵対的Deep Hedging
2023/03/04 sigfin30 PR: Special Session on Applied Informatics in Finance and...
2022/11/17 PRIMA2022: Analysis of Carbon Neutrality Scenarios of Industrial C...
2022/10/30 BESC2022: Parameter Tuning Method for Multi-agent Simulation using...
2022/03/12 sigfin28: オプションによるオプションのヘッジを可能にする二重 Deep Hedging 機構
2020/11/19 PRIMA2020: Simulation of Unintentional Collusion Caused by Auto Pr...
2020/03/18 NLP2020: 金融文書のための別タスク学習による教師なし重要文判定
2018/11/17 ICDMW 2018: Selection of Related Stocks using Financial Text Mining
2018/06/06 JSAI2018 Effects Analysis of CAR Regulations on Financial Markets ...
Ad

Recently uploaded (20)

PPTX
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
PDF
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
PPTX
CyberSecurity Mobile and Wireless Devices
PPTX
Amdahl’s law is explained in the above power point presentations
PPTX
Module 8- Technological and Communication Skills.pptx
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PDF
Improvement effect of pyrolyzed agro-food biochar on the properties of.pdf
PPTX
Software Engineering and software moduleing
PDF
August 2025 - Top 10 Read Articles in Network Security & Its Applications
PDF
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
PDF
Design Guidelines and solutions for Plastics parts
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PPTX
Current and future trends in Computer Vision.pptx
PDF
distributed database system" (DDBS) is often used to refer to both the distri...
PDF
August -2025_Top10 Read_Articles_ijait.pdf
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PPTX
tack Data Structure with Array and Linked List Implementation, Push and Pop O...
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
CyberSecurity Mobile and Wireless Devices
Amdahl’s law is explained in the above power point presentations
Module 8- Technological and Communication Skills.pptx
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
Improvement effect of pyrolyzed agro-food biochar on the properties of.pdf
Software Engineering and software moduleing
August 2025 - Top 10 Read Articles in Network Security & Its Applications
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
Design Guidelines and solutions for Plastics parts
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
Current and future trends in Computer Vision.pptx
distributed database system" (DDBS) is often used to refer to both the distri...
August -2025_Top10 Read_Articles_ijait.pdf
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
III.4.1.2_The_Space_Environment.p pdffdf
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
tack Data Structure with Array and Linked List Implementation, Push and Pop O...

2020/11/19 PRIMA2020: Implementation of Real Data for Financial Market Simulation using Clustering, Deep Learning, and Artificial Financial Market

  • 1. Implementation of Real Data for Financial Market Simulation using Clustering, Deep Learning, and Artificial Financial Market Masanori HIRANO1, Hiroyasu MATSUSHIMA2, Kiyoshi IZUMI1, and Hiroki SAKAJI1 1 School of Engineering, The University of Tokyo 2 Center for Data Science Education and Research, Shiga University hirano@g.ecc.u-tokyo.ac.jp https://mhirano.jp/
  • 2. Motivation • Instability in Financial Markets • 2008 financial crisis • Flush Crush • Price fluctuation by COVID-19 • Regulations are necessary • New regulations like Basel III • Can avoid abovementioned crisis? • Difficulties in Financial markets • nonstationary • Rare phenomena happen frequently • => Simulation is good solution, but not trustable. • Find what’s the matter • Dealing with trustability with actual data ©M.HIRANO & Izumi Lab. DJIA on May 6, 2010 (Flush crush) DJIA in 2020 (COVID-19) 11/19/20 2
  • 3. Last year: PRIMA 2019 • We showed the difference between simulation & data • => simulation model can overpass key features. ©M.HIRANO & Izumi Lab. Missing feature in simulation 11/19/20 3
  • 4. Our work • Proposed a new model built using actual data • Comparing in Simulation Traditional model  Our new model w/ data • Only focus on HFT-MM  Specific trader & strategy • Target: Tokyo Stock Exchange • We analyzed a special data provided by JPX 11/19/20 ©M.HIRANO & Izumi Lab. 4 Tokyo Stock Exchange
  • 5. What’s the HFT-MM? • High-Frequency-Trader Market-Making strategy • Market-making strategy: • (Basically) order near the best price • Get profit by the spread (1001-999=2) • Do repeatedly • Risk-hedge by high-frequency-trade: • Always have price move risk (Price move >> spread) • Do action faster & hedge risk by setting off their inventory • => These features are easy to recognize in data 11/19/20 ©M.HIRANO & Izumi Lab. 5 Sell Buy
  • 6. Data Extraction We need HFT-MM ordering data… ©M.HIRANO & Izumi Lab.11/19/20 6
  • 7. Data • “Order-book reproduction data” provided by Japan Exchange Group (JPX) • Containing masked trader information <- Called “Virtual Server (VS)” 11/19/20 ©M.HIRANO & Izumi Lab. 7 Time Ticker Kind Buy/sell VS Price 11:11:50.702813 A Limit Order sell VS1 2570 11:11:50.703600 B Executed buy VS4 Market Order 11:11:50.704001 A Cancel sell VS1 2570 Sample Some columns are not shown such as volume
  • 8. Indices for clustering (extracting HFT-MM) • The logarithm of action per ticker ActionsPerTicker = newOrders + changeOrders + (cancelOrders) (numTicker) ActionsPerTickerLOG = ln ActionsPerTicker • Inventory Ratio InventoryRatioABS = 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 𝑛𝑛ticker soldVolume ticker − boughtVolume ticker soldVolume ticker + boughtVolume ticker • Executed order ratio • Cancel order ratio • Market order ration • The logarithm of ticker per VS TickerPerVSLOG = ln (numTicker) (numVS) 11/19/20 ©M.HIRANO & Izumi Lab. 8 Many order Low inventory Many VS usage Low executed ratio High cancel ratio Low market order ratio
  • 9. Data outline • Jan. 2015 – mid-Sep. 2015: All 178 business days • All traders: 2654 traders • Only HFT: 181 traders <= based on ActionsPerTicker ≥ 1000 • Hierarchical Clustering for HFT-MM 11/19/20 ©M.HIRANO & Izumi Lab. 9
  • 10. Hierarchical Clustering [Uno et al. 18] • Normalizations for each indices & clustering • Euclidean distance • Ward’s method • 10 clusters 11/19/20 ©M.HIRANO & Izumi Lab. 10 HFT-MM cluster based on indices
  • 11. Data split • We got ordering data of HFT-MM • 2015/01-07 => model training of HFT-MM • 2015/08 => evaluation of simulation 11/19/20 ©M.HIRANO & Izumi Lab. 11
  • 12. Simulation & Models ©M.HIRANO & Izumi Lab.11/19/20 12
  • 13. Simulation outline • We used “PlhamJ” as a simulation platform. PlhamJ: Platform for Large-scale and High-frequency Artificial Market (Java version) 11/19/20 ©M.HIRANO & Izumi Lab. 13
  • 14. Simulation setting • 1,000 stylized traders + 1 traditional HFT-MM trader vs • 1,000 stylized traders + 1 ML HFT-MM trader (new) • Comparison between behaviors of - • 1 traditional HFT-MM trader in simulation • 1 ML HFT-MM trader (new) in simulation • Real data (out of learning data) ©M.HIRANO & Izumi Lab.11/19/20 14
  • 15. Stylized Trader Agents [Chiarella et al. 02] • Logarithmic return prediction for bid/ask price 𝑟𝑟 = 1 𝑤𝑤𝐹𝐹+𝑤𝑤𝐶𝐶+𝑤𝑤 𝑁𝑁 𝑤𝑤𝐹𝐹 ⋅ 𝐹𝐹 + 𝑤𝑤𝐶𝐶 ⋅ 𝐶𝐶 + 𝑤𝑤𝑁𝑁 ⋅ 𝑁𝑁 • Fundamentals 𝐹𝐹 = 1 mean reversion time ln current market price current fundamental price • Chartist (trend) 𝐶𝐶 = logarithm averaged return in the past • Noise 𝑁𝑁 ~ 𝑁𝑁 0, 𝜎𝜎𝑁𝑁 • + margin => decide price • Every 100 step they make a buy or sell order 11/19/20 ©M.HIRANO & Izumi Lab. 15
  • 16. Traditional HFT-MM Trader [Avellaneda et al. 02] • Trader’s price interval: 𝛾𝛾𝑖𝑖�𝜎𝜎𝑖𝑖 2 + 2 𝛾𝛾𝑖𝑖 ln 1 + 𝛾𝛾𝑖𝑖 𝑘𝑘 • Trader’s mid-price 𝑝𝑝𝑡𝑡 ∗ − 𝛾𝛾𝑖𝑖�𝜎𝜎𝑖𝑖 2 𝑞𝑞𝑡𝑡 𝑖𝑖 • Note: 𝛾𝛾𝑖𝑖: risk-hedge level �𝜎𝜎𝑖𝑖: SD in price 𝑘𝑘: a parameter for order arrival 𝑝𝑝𝑡𝑡 ∗ : fundamental price 𝑞𝑞𝑡𝑡 𝑖𝑖 :inventory 11/19/20 ©M.HIRANO & Izumi Lab. 16 Price Sell Buy Fundamental Price Trader’s mid-price Trader’s price interval
  • 17. HFT-MM Machine Learned Model • Using machine learning for data, we build a model • Model predict the next action of traders ©M.HIRANO & Izumi Lab.11/19/20 17
  • 19. Comparison 11/19/20 ©M.HIRANO & Izumi Lab. 19 Ticks between the best price and ordering of HFT-MM
  • 20. Comparison in KL Divergence • Our new ML model outperform traditional model marginally… • Why so big variance? => 11/19/20 ©M.HIRANO & Izumi Lab. 20 Distribution of 𝐷𝐷𝐾𝐾𝐾𝐾 of our new model Q P Mean SD Actual Traditional 0.730009 0.119884 Actual ML 0.648459 0.957854
  • 21. Comparison w/ omission • Error case: easy to detect => omit them • => the omission give us strong results 11/19/20 ©M.HIRANO & Izumi Lab. 21 Q P Mean SD Actual Traditional 0.730009 0.119884 Actual ML (w/ omissions) 0.186192 0.085099
  • 22. Discussion & Conclusion • Our new model show the strong result w/ omission • Reveal the needs & benefits of real data usage • But, we should deal with non-robustness of ML model 11/19/20 ©M.HIRANO & Izumi Lab. 22 Future work • More robust ML model • Model building with data for all trader