SlideShare a Scribd company logo
Real Time Machine Learning Architecture &
Sentiment Analysis
Quantcon 2016, Singapore
Juan CHENG, PHD
Data Scientist
cheng.juan@infotrie.com
www.infotrie.com
@infotrie
www.finsents.com
@finsents
● About us
● News analytics in finance
● A news analytics case
○ Information extraction of text
○ Text feature extraction for machine learning classification
○ Big data tools applied
○ Architecture that combines all
Frederic GEORJON
CEO
Ajil GEORGE
Head of Development Center
Daniel ABROUK
Head of EMEA
Paris/Singapore London
LONG Zhicheng
CTO
Singapore India
FinSentS.com
➔ Real-time information
and trading portal
➔ Millions of sources /
Multilingual
➔ Saas or on premises
➔ Real-time Alerts
➔ Actionable signals
Sentiment Data
➔ Through API or 1/3 parties
➔ Up to 15 years of history
➔ Low latency / Tick by tick
➔ 50,000+ entities
➔ Stock, Forex, commodities,
index, Macroeconomic topics
etc…
Consultancy and Training
➔ Trading Technology
➔ Algorithmic trading
➔ Big Data
➔ Natural Language
Processing (NLP)
➔ Machine Learning
B.
No, I’m a quant. I
found it’s hard to
quantified news.
A.
No, I found news are
noisy. They are just
too much.
C.
Yes. But I found using
news is not very efficient.
I have to manually
related them to my
portfolio.
Access to News / News
management
- Visualization tools
- Filtering tools
- On demand view
Feed from multiple sources:
- Social Media
- Web based content
- Private sources
- Internal data
News Content Alerts
based on sentiment
indicator
Provide accurate
information from Big
Data environment and
pushed it front of Users
in real time for Risk
management
Dashboard
- Consolidated
Dashboard
- Portfolio Alerts
Actionable indicators
Users receive news
signals for trading /
hedging / risk
management based
sentiment indicator
Algo Trading / Robo Trading
Real Time algorithmic trading
Sentiment indicator and News
Analytics
Equity Research / Sales Team Hedging Trader / Prop Trader
- News Tag Cloud
- Filtering newsfeed with
Social media blotter, news
blotter
- Search Engine on demand
- Topics detection
- Rumours alerts
- News qualification per
importance
- Relevant information
from single screen
- Automatic Alert
- Integrated to OMS
Provide relevant news
analytics indicator for
hedging or trade idea
generation
Fully integrated news
analytics signals integrated
to algo trading strategies
Reuters
MARKET NEWS | Fri Oct 21, 2016 | 2:18am EDT
AT&T acquires Time Warner for $85 billion
NEW YORK- AT&T Inc said it agreed to buy Time Warner Inc for $85.4 billion,
the boldest move yet by a telecommunications company to acquire content to
stream over its high-speed network to attract a growing number of online
viewers.
The trend of consolidation comes as technology advances have been upending
traditional entertainment companies. Many in the industry believe that getting
bigger is the best way to compete with companies like Google, Apple, Netflix and
Facebook.
David Goldman and Paul R. La Monica contributed to this report.
Reuters
MARKET NEWS | Fri Oct 21, 2016 | 2:18am EDT
AT&T acquires Time Warner for $85 billion
NEW YORK- AT&T Inc said it agreed to buy Time Warner Inc for $85.4 billion,
the boldest move yet by a telecommunications company to acquire content to
stream over its high-speed network to attract a growing number of online
viewers.
The trend of consolidation comes as technology advances have been upending
traditional entertainment companies. Many in the industry believe that getting
bigger is the best way to compete with companies like Google, Apple, Netflix and
Facebook.
David Goldman and Paul R. La Monica contributed to this report.
Source
Category
Time
Location
Named Entity
Sentiment
Event
Hacking skill, regex,nlp, named entity recognition, pos taggers
Train Document Set:
d1: The sky is blue.
d2: The sun is bright.
Test Document Set:
d3: The sun in the sky is bright.
d4: We can see the shining sun, the
bright sun.
Vector Space Model (VSM)
t1 t2...
d1
d2 ...
Train Document Set:
d1: The sky is blue.
d2: The sun is bright.
Vocabulary
Term frequency(TF)
TF emphasize a term which is almost present in the entire corpus
TD-IDF
TF example IDF example
Normalized TD-IDF
Train Document Set:
d1: The sky is blue.
d2: The sun is bright.
Test Document Set:
d3: The sun in the sky is bright.
d4: We can see the shining sun, the
bright sun.
Vector Space Model (VSM)
t1 t2...
d1
d2 ...
Machine Learning
- Companies, indexes
- People, locations, organizations
- Events
- Regions
NLP
Text
- Dow Jones, bloomberg
- Web news, blogs, twitter
- 1000+ sources
Feature Extraction
Classification
Sentiment
- 15 years history
- Tens of millions of articles
Training
Indexing
- Sector/industry
- Commodity, FX, ETFs
- Political, country risk
- Macroeconomic
- Fear, greed, anger,
happiness
Aggregation
❏ Guaranteed data processing
❏ Horizontal scalability
❏ Fault-tolerance
❏ Higher level abstraction than message passing
❏ Real-time machine learning for classification and predictive
analytics
Analytics on
Massive Historical
Text Data
Analytics on
recent pass
Realtime
analytics
Batch layer real-time layer
Fast and general engine for large-scale distributed data processing
Memory Network CPU’s Disk
Reference: spark
Logistic regression in Hadoop and Spark
open source distributed realtime computation system, easily process unbounded streams of data
Storm was benchmarked at
processing one million 100
byte messages per second
per node on hardware with the
following specs:
● Processor: 2x Intel
E5645@2.4Ghz
● Memory: 24 GB
Reference: storm
Spout
bolt
✓ Guaranteed data processing
✓ Horizontal scalability
✓ Fault-tolerance
✓ Higher level abstraction than message
passing
✓ Real-time machine learning for
classification and predictive analytics
NoSQL Database
cache
persistent
Kafka Filter, topic classification,
sentiment calculation,
entity detection, stock
mapping, sentiment
aggregation
Apache Storm
DFS
Nlp models
ML models
Producers
Blogs, twitter,
news,
bloomberg...
Model training, batch
cleaning, batch calculation
Apache Spark
Solr
Relational
Database
Web
app
➔ Scale analysis pipeline
➔ Live stats
➔ Recommendations
➔ Predictions
➔ Realtime analytics
➔ Online machine learning
Apply similar architecture in
Available @www.infotrie.com
contact@infotrie.com
@infotrie
www.finsents.com
@finsents
Sentiment in itself is a powerful trading indicator out of which
multiple trading strategies can be build
Simulate impact of
complex events
MIFID alert
Improve Client's communication
Regulatory
Process complex / low signals
events
ESG monitoring
Ecological – Social –
Governance
An union calls for
a strike in a
factory in
Argentina?
Negative news coverage is
accelerating for a stock I
hold in Chinese press but
are not yet in English press?
A European company
employs children in
Bangladesh (*)?
ACTIONS
1
1
1
1
1
1
1
1
1
3
2
3
1
1
1
1
1
1
1
1
1
1
3
2
3
1
1
1
1
1
1
1
1
1
1
3
2
3
1
dfs
9
6
3
9
9
6
9
3
text_file.flatMap(lambda line: line.split(" ")).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)Job
Executor
Nimbus
Zookeeper
Zookeeper
Worker
Worker
Worker
Worker
Velocity
Big
Data
Variety
- News, blogs, social media,
analyst reports, company
announcement, traders’ chat
room…
- Financial reports, price,
economic events...
- Weather, GPS, image....
Volumn
- ETL
- Machine learning
- Correlation analysis,
- regressions….
- As fast as possible

More Related Content

PPTX
"How to Run a Quantitative Trading Business in China with Python" by Xiaoyou ...
PDF
Leveraging Quandl
PDF
Needle in the Haystack by Anshul Vikram Pandey at QuantCon 2016
PDF
"The Hunt For Alpha Among Alternative Data Sources" by Dr. Michael Halls-Moor...
PPTX
Risk Systems That Read by Nick Wade, Director of APAC marketing, Northfield I...
PDF
Should You Build Your Own Backtester? by Michael Halls-Moore at QuantCon 2016
PDF
Empowering Quants in the Data Economy by Napoleon Hernandez at QuantCon 2016
PDF
"Applying Deep Learning Techniques to Financial Time Series" by Scott Treloar...
"How to Run a Quantitative Trading Business in China with Python" by Xiaoyou ...
Leveraging Quandl
Needle in the Haystack by Anshul Vikram Pandey at QuantCon 2016
"The Hunt For Alpha Among Alternative Data Sources" by Dr. Michael Halls-Moor...
Risk Systems That Read by Nick Wade, Director of APAC marketing, Northfield I...
Should You Build Your Own Backtester? by Michael Halls-Moore at QuantCon 2016
Empowering Quants in the Data Economy by Napoleon Hernandez at QuantCon 2016
"Applying Deep Learning Techniques to Financial Time Series" by Scott Treloar...

What's hot (9)

PDF
Latency in Automated Trading Systems by Andrei Kirilenko at QuantCon 2016
PPTX
“The Sum of The Parts Must Be Greater Than the Whole; Is There or What Is A C...
PDF
"Machine Learning Approaches to Regime-aware Portfolio Management" by Michael...
PPTX
"Supply Chain Earnings Diffusion" by Josh Holcroft, Head of Quantitative Rese...
PPTX
Quant in Crypto Land
PDF
Blockchain/Tokenization as a Business: token types, business models, fundrais...
PPTX
Careers in Finance for Tech Graduates
PPTX
Social Analytics for Cryptocurrencies
PPTX
Internship presentation
Latency in Automated Trading Systems by Andrei Kirilenko at QuantCon 2016
“The Sum of The Parts Must Be Greater Than the Whole; Is There or What Is A C...
"Machine Learning Approaches to Regime-aware Portfolio Management" by Michael...
"Supply Chain Earnings Diffusion" by Josh Holcroft, Head of Quantitative Rese...
Quant in Crypto Land
Blockchain/Tokenization as a Business: token types, business models, fundrais...
Careers in Finance for Tech Graduates
Social Analytics for Cryptocurrencies
Internship presentation
Ad

Similar to “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Juan Cheng, Data Scientist at Infotrie (20)

PDF
“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...
PDF
Applications of AI in Supply Chain Management: Hype versus Reality
PPTX
Microsoft Dryad
PDF
Real-Time AI Streaming - AI Max Princeton
PDF
Filtering From the Firehose: Real Time Social Media Streaming
PPTX
Algorithm Marketplace and the new "Algorithm Economy"
PDF
Session 1 - The Current Landscape of Big Data Benchmarks
PDF
AI meets Big Data
PPTX
Integrate Big Data into Your Organization with Informatica and Perficient
PDF
applications and advantages of python
PPTX
Spark-Zeppelin-ML on HWX
PDF
Open Source Bristol 30 March 2022
PDF
Harness the Power of Big Data with Oracle
PPTX
GEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptx
PDF
Benefiting from Semantic AI along the data life cycle
PPTX
[DSC Croatia 22] Building smarter ML and AI models and making them more accur...
PDF
Introduction to Big Data
PDF
emerging trends.pdf
PDF
Big data Introduction by Mohan
PDF
Big security for big data
“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...
Applications of AI in Supply Chain Management: Hype versus Reality
Microsoft Dryad
Real-Time AI Streaming - AI Max Princeton
Filtering From the Firehose: Real Time Social Media Streaming
Algorithm Marketplace and the new "Algorithm Economy"
Session 1 - The Current Landscape of Big Data Benchmarks
AI meets Big Data
Integrate Big Data into Your Organization with Informatica and Perficient
applications and advantages of python
Spark-Zeppelin-ML on HWX
Open Source Bristol 30 March 2022
Harness the Power of Big Data with Oracle
GEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptx
Benefiting from Semantic AI along the data life cycle
[DSC Croatia 22] Building smarter ML and AI models and making them more accur...
Introduction to Big Data
emerging trends.pdf
Big data Introduction by Mohan
Big security for big data
Ad

More from Quantopian (20)

PPTX
Being open (source) in the traditionally secretive field of quant finance.
PPTX
Stauth common pitfalls_stock_market_modeling_pqtc_fall2018
PPTX
Tearsheet feedback webinar 10.10.18
PDF
"Three Dimensional Time: Working with Alternative Data" by Kathryn Glowinski,...
PPTX
"Alpha from Alternative Data" by Emmett Kilduff, Founder and CEO of Eagle Alpha
PDF
"Portfolio Optimisation When You Don’t Know the Future (or the Past)" by Rob...
PPTX
"Quant Trading for a Living – Lessons from a Life in the Trenches" by Andreas...
PDF
“Market Insights Through the Lens of a Risk Model” by Olivier d'Assier, Head ...
PDF
"Maximize Alpha with Systematic Factor Testing" by Cheng Peng, Software Engin...
PDF
"Fundamental Forecasts: Methods and Timing" by Vinesh Jha, CEO of ExtractAlpha
PPTX
"From Alpha Discovery to Portfolio Construction: Pitfalls and Solutions" by D...
PDF
"Deep Reinforcement Learning for Optimal Order Placement in a Limit Order Boo...
PPTX
"Making the Grade: A Look Inside the Algorithm Evaluation Process" by Dr. Jes...
PDF
"Building Diversified Portfolios that Outperform Out-of-Sample" by Dr. Marcos...
PPTX
"From Insufficient Economic data to Economic Big Data – How Trade Data is red...
PDF
"A Framework-Based Approach to Building Quantitative Trading Systems" by Dr. ...
PDF
"Don't Lose Your Shirt Trading Mean-Reversion" by Edith Mandel, Principal at ...
PDF
"Deep Q-Learning for Trading" by Dr. Tucker Balch, Professor of Interactive C...
PDF
"Quantum Hierarchical Risk Parity - A Quantum-Inspired Approach to Portfolio ...
PDF
"Snake Oil, Swamp Land, and Factor-Based Investing" by Gary Antonacci, author...
Being open (source) in the traditionally secretive field of quant finance.
Stauth common pitfalls_stock_market_modeling_pqtc_fall2018
Tearsheet feedback webinar 10.10.18
"Three Dimensional Time: Working with Alternative Data" by Kathryn Glowinski,...
"Alpha from Alternative Data" by Emmett Kilduff, Founder and CEO of Eagle Alpha
"Portfolio Optimisation When You Don’t Know the Future (or the Past)" by Rob...
"Quant Trading for a Living – Lessons from a Life in the Trenches" by Andreas...
“Market Insights Through the Lens of a Risk Model” by Olivier d'Assier, Head ...
"Maximize Alpha with Systematic Factor Testing" by Cheng Peng, Software Engin...
"Fundamental Forecasts: Methods and Timing" by Vinesh Jha, CEO of ExtractAlpha
"From Alpha Discovery to Portfolio Construction: Pitfalls and Solutions" by D...
"Deep Reinforcement Learning for Optimal Order Placement in a Limit Order Boo...
"Making the Grade: A Look Inside the Algorithm Evaluation Process" by Dr. Jes...
"Building Diversified Portfolios that Outperform Out-of-Sample" by Dr. Marcos...
"From Insufficient Economic data to Economic Big Data – How Trade Data is red...
"A Framework-Based Approach to Building Quantitative Trading Systems" by Dr. ...
"Don't Lose Your Shirt Trading Mean-Reversion" by Edith Mandel, Principal at ...
"Deep Q-Learning for Trading" by Dr. Tucker Balch, Professor of Interactive C...
"Quantum Hierarchical Risk Parity - A Quantum-Inspired Approach to Portfolio ...
"Snake Oil, Swamp Land, and Factor-Based Investing" by Gary Antonacci, author...

Recently uploaded (20)

PPTX
Who’s winning the race to be the world’s first trillionaire.pptx
PDF
HCWM AND HAI FOR BHCM STUDENTS(1).Pdf and ptts
PDF
Understanding University Research Expenditures (1)_compressed.pdf
PPTX
4.5.1 Financial Governance_Appropriation & Finance.pptx
PPTX
Basic Concepts of Economics.pvhjkl;vbjkl;ptx
PDF
ECONOMICS AND ENTREPRENEURS LESSONSS AND
PDF
Spending, Allocation Choices, and Aging THROUGH Retirement. Are all of these ...
PDF
Copia de Minimal 3D Technology Consulting Presentation.pdf
PDF
CLIMATE CHANGE AS A THREAT MULTIPLIER: ASSESSING ITS IMPACT ON RESOURCE SCARC...
PPTX
EABDM Slides for Indifference curve.pptx
PPTX
How best to drive Metrics, Ratios, and Key Performance Indicators
PPTX
Session 3. Time Value of Money.pptx_finance
PPT
KPMG FA Benefits Report_FINAL_Jan 27_2010.ppt
PDF
NAPF_RESPONSE_TO_THE_PENSIONS_COMMISSION_8 _2_.pdf
PDF
Mathematical Economics 23lec03slides.pdf
PDF
caregiving tools.pdf...........................
PPTX
kyc aml guideline a detailed pt onthat.pptx
PDF
6a Transition Through Old Age in a Dynamic Retirement Distribution Model JFP ...
PDF
Bitcoin Layer August 2025: Power Laws of Bitcoin: The Core and Bubbles
PDF
5a An Age-Based, Three-Dimensional Distribution Model Incorporating Sequence ...
Who’s winning the race to be the world’s first trillionaire.pptx
HCWM AND HAI FOR BHCM STUDENTS(1).Pdf and ptts
Understanding University Research Expenditures (1)_compressed.pdf
4.5.1 Financial Governance_Appropriation & Finance.pptx
Basic Concepts of Economics.pvhjkl;vbjkl;ptx
ECONOMICS AND ENTREPRENEURS LESSONSS AND
Spending, Allocation Choices, and Aging THROUGH Retirement. Are all of these ...
Copia de Minimal 3D Technology Consulting Presentation.pdf
CLIMATE CHANGE AS A THREAT MULTIPLIER: ASSESSING ITS IMPACT ON RESOURCE SCARC...
EABDM Slides for Indifference curve.pptx
How best to drive Metrics, Ratios, and Key Performance Indicators
Session 3. Time Value of Money.pptx_finance
KPMG FA Benefits Report_FINAL_Jan 27_2010.ppt
NAPF_RESPONSE_TO_THE_PENSIONS_COMMISSION_8 _2_.pdf
Mathematical Economics 23lec03slides.pdf
caregiving tools.pdf...........................
kyc aml guideline a detailed pt onthat.pptx
6a Transition Through Old Age in a Dynamic Retirement Distribution Model JFP ...
Bitcoin Layer August 2025: Power Laws of Bitcoin: The Core and Bubbles
5a An Age-Based, Three-Dimensional Distribution Model Incorporating Sequence ...

“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Juan Cheng, Data Scientist at Infotrie

  • 1. Real Time Machine Learning Architecture & Sentiment Analysis Quantcon 2016, Singapore Juan CHENG, PHD Data Scientist cheng.juan@infotrie.com www.infotrie.com @infotrie www.finsents.com @finsents
  • 2. ● About us ● News analytics in finance ● A news analytics case ○ Information extraction of text ○ Text feature extraction for machine learning classification ○ Big data tools applied ○ Architecture that combines all
  • 3. Frederic GEORJON CEO Ajil GEORGE Head of Development Center Daniel ABROUK Head of EMEA Paris/Singapore London LONG Zhicheng CTO Singapore India
  • 4. FinSentS.com ➔ Real-time information and trading portal ➔ Millions of sources / Multilingual ➔ Saas or on premises ➔ Real-time Alerts ➔ Actionable signals Sentiment Data ➔ Through API or 1/3 parties ➔ Up to 15 years of history ➔ Low latency / Tick by tick ➔ 50,000+ entities ➔ Stock, Forex, commodities, index, Macroeconomic topics etc… Consultancy and Training ➔ Trading Technology ➔ Algorithmic trading ➔ Big Data ➔ Natural Language Processing (NLP) ➔ Machine Learning
  • 5. B. No, I’m a quant. I found it’s hard to quantified news. A. No, I found news are noisy. They are just too much. C. Yes. But I found using news is not very efficient. I have to manually related them to my portfolio.
  • 6. Access to News / News management - Visualization tools - Filtering tools - On demand view Feed from multiple sources: - Social Media - Web based content - Private sources - Internal data News Content Alerts based on sentiment indicator Provide accurate information from Big Data environment and pushed it front of Users in real time for Risk management Dashboard - Consolidated Dashboard - Portfolio Alerts Actionable indicators Users receive news signals for trading / hedging / risk management based sentiment indicator Algo Trading / Robo Trading Real Time algorithmic trading Sentiment indicator and News Analytics Equity Research / Sales Team Hedging Trader / Prop Trader - News Tag Cloud - Filtering newsfeed with Social media blotter, news blotter - Search Engine on demand - Topics detection - Rumours alerts - News qualification per importance - Relevant information from single screen - Automatic Alert - Integrated to OMS Provide relevant news analytics indicator for hedging or trade idea generation Fully integrated news analytics signals integrated to algo trading strategies
  • 7. Reuters MARKET NEWS | Fri Oct 21, 2016 | 2:18am EDT AT&T acquires Time Warner for $85 billion NEW YORK- AT&T Inc said it agreed to buy Time Warner Inc for $85.4 billion, the boldest move yet by a telecommunications company to acquire content to stream over its high-speed network to attract a growing number of online viewers. The trend of consolidation comes as technology advances have been upending traditional entertainment companies. Many in the industry believe that getting bigger is the best way to compete with companies like Google, Apple, Netflix and Facebook. David Goldman and Paul R. La Monica contributed to this report.
  • 8. Reuters MARKET NEWS | Fri Oct 21, 2016 | 2:18am EDT AT&T acquires Time Warner for $85 billion NEW YORK- AT&T Inc said it agreed to buy Time Warner Inc for $85.4 billion, the boldest move yet by a telecommunications company to acquire content to stream over its high-speed network to attract a growing number of online viewers. The trend of consolidation comes as technology advances have been upending traditional entertainment companies. Many in the industry believe that getting bigger is the best way to compete with companies like Google, Apple, Netflix and Facebook. David Goldman and Paul R. La Monica contributed to this report. Source Category Time Location Named Entity Sentiment Event Hacking skill, regex,nlp, named entity recognition, pos taggers
  • 9. Train Document Set: d1: The sky is blue. d2: The sun is bright. Test Document Set: d3: The sun in the sky is bright. d4: We can see the shining sun, the bright sun. Vector Space Model (VSM) t1 t2... d1 d2 ...
  • 10. Train Document Set: d1: The sky is blue. d2: The sun is bright. Vocabulary Term frequency(TF)
  • 11. TF emphasize a term which is almost present in the entire corpus TD-IDF TF example IDF example Normalized TD-IDF
  • 12. Train Document Set: d1: The sky is blue. d2: The sun is bright. Test Document Set: d3: The sun in the sky is bright. d4: We can see the shining sun, the bright sun. Vector Space Model (VSM) t1 t2... d1 d2 ... Machine Learning
  • 13. - Companies, indexes - People, locations, organizations - Events - Regions NLP Text - Dow Jones, bloomberg - Web news, blogs, twitter - 1000+ sources Feature Extraction Classification Sentiment - 15 years history - Tens of millions of articles Training Indexing - Sector/industry - Commodity, FX, ETFs - Political, country risk - Macroeconomic - Fear, greed, anger, happiness Aggregation
  • 14. ❏ Guaranteed data processing ❏ Horizontal scalability ❏ Fault-tolerance ❏ Higher level abstraction than message passing ❏ Real-time machine learning for classification and predictive analytics
  • 15. Analytics on Massive Historical Text Data Analytics on recent pass Realtime analytics Batch layer real-time layer
  • 16. Fast and general engine for large-scale distributed data processing Memory Network CPU’s Disk Reference: spark Logistic regression in Hadoop and Spark
  • 17. open source distributed realtime computation system, easily process unbounded streams of data Storm was benchmarked at processing one million 100 byte messages per second per node on hardware with the following specs: ● Processor: 2x Intel E5645@2.4Ghz ● Memory: 24 GB Reference: storm Spout bolt
  • 18. ✓ Guaranteed data processing ✓ Horizontal scalability ✓ Fault-tolerance ✓ Higher level abstraction than message passing ✓ Real-time machine learning for classification and predictive analytics
  • 19. NoSQL Database cache persistent Kafka Filter, topic classification, sentiment calculation, entity detection, stock mapping, sentiment aggregation Apache Storm DFS Nlp models ML models Producers Blogs, twitter, news, bloomberg... Model training, batch cleaning, batch calculation Apache Spark Solr Relational Database Web app
  • 20. ➔ Scale analysis pipeline ➔ Live stats ➔ Recommendations ➔ Predictions ➔ Realtime analytics ➔ Online machine learning Apply similar architecture in
  • 22. Sentiment in itself is a powerful trading indicator out of which multiple trading strategies can be build Simulate impact of complex events
  • 23. MIFID alert Improve Client's communication Regulatory Process complex / low signals events ESG monitoring Ecological – Social – Governance An union calls for a strike in a factory in Argentina? Negative news coverage is accelerating for a stock I hold in Chinese press but are not yet in English press? A European company employs children in Bangladesh (*)? ACTIONS
  • 26. Velocity Big Data Variety - News, blogs, social media, analyst reports, company announcement, traders’ chat room… - Financial reports, price, economic events... - Weather, GPS, image.... Volumn - ETL - Machine learning - Correlation analysis, - regressions…. - As fast as possible