Deep Learning and Topological
Data Analysis for Advanced Data
Segmentation and Prediction
Big Data Week, 2017
Edward Kibardin
The vast majority of data is unstructured and unlabeled
Medical and
DNA profiling
Images
TextStock market
transactions
Customer
Activities
Sensor signals
(IoT)
System Logs Sound
Unstructured Data
The vast majority of data is unstructured and unlabeled
Medical and
DNA profiling
Images
TextStock market
transactions
Customer
Activities
Sensor signals
(IoS)
System Logs Sound
Unstructured Data
• How many categories in my dataset?
• Which categories are the best for the
business?
• Why some objects are not like the others?
• How I can contextualize new objects?
• Is there a simpler way to describe my data?
Unstructured Data
Business question to unstructured and unlabeled data
Text
Text Analysis
Legacy approach to text analysis
Text Analysis
Understand the language in
maintenance job descriptions.
Discover new job types, for better
reporting and further
investigations.
• 365,000 maintenance job
• Each description from 5 to 1000
words
New challenges
Text Analysis
Text Analysis
Text Analysis
a. Compute a combinatorial model
approximating the structure of the
underlying space
b. Then compute topological invariants
of this structure
c. Represent these topological
invariants in 2d space
Topological Data Analysis Pipeline
a b c
Reference: Teng Ma ; Zhuangzhi Wu ; Pei Luo ; Lu Feng. Reeb graph computation through
spectral clustering, 2011.
Sensor signals
(Internet of Things)
Sensor Signals (Internet of Things)
Legacy approach to sensor data analysis
New challenges
Human Activity
Recognition Using
Smartphones
Technical Research Centre for Dependency
Care and Autonomous Living
Universitat Politècnica de Catalunya
(BarcelonaTech)
Activities: Standing, sitting,
laying, walking, walking upstairs,
walking downstairs, stand-to-sit,
sit-to-lay, lay-to-sit, sit-to-stand,
stand-to-lay
Sensor Signals (Internet of Things)
 BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data Analysis for Advanced Data Segmentations and Predictions
Customer
Activities
Customer Activity
Legacy approach to customer activity analysis
Deep Generative Nets + TDA
1. Learning of deep generative model
2. Fine-tuning using topological loss
 BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data Analysis for Advanced Data Segmentations and Predictions
Links
Topology And Data (Gunnar Carlsson):
http://www.ams.org/journals/bull/2009-46-02/S0273-0979-09-01249-X/S0273-0979-09-
01249-X.pdf
Discrete Morse Theory and Persistent Homology (Kevin P. Knudson):
http://www.math.fsu.edu/~hironaka/FSUUF/knudson.pdf
Topological Persistence and Simplification
(Herbert Edelsbrunner, David Letscher, Afra Zomorodian):
http://math.uchicago.edu/~shmuel/AAT-readings/Data%20Analysis%20/PersTop.pdf
Extracting and Composing Robust Features with Denoising Autoencoders
(Pascal Vincent, Hugo Larochelle, Yoshua Bengio, Pierre-Antoine Manzagol)
http://www.iro.umontreal.ca/~vincentp/Publications/denoising_autoencoders_tr1316.pdf
More examples using
TDA and Deep Learning
Gym customer
activity patterns
ObamaCare health
insurance
20 News groups
text dataset
Motorcycles
Christian Atheism
Religion.misc
Politics.guns
Politics.misc
Politics.mideast
Scy.crypt
Scy.med
Hockey
Baseball
Autos
ForsaleMac.hardware
Electronics
Scy.space
Comp.graphics
Windows.x
Ms-windows.misc
Pc.hardware
info@datarefiner.co
m
www.datarefiner.com

More Related Content

DOC
document-part- (6).doc
PDF
Python and Machine Learning Applications in Industry
PPT
A Knowledge-based Approach for Real-Time IoT Stream Annotation and Processing
PDF
Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...
PPTX
Era ofdataeconomyv4short
PDF
[DOLAP2019] Augmented Business Intelligence
PDF
[ADBIS 2021] - Optimizing Execution Plans in a Multistore
PPTX
[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...
document-part- (6).doc
Python and Machine Learning Applications in Industry
A Knowledge-based Approach for Real-Time IoT Stream Annotation and Processing
Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...
Era ofdataeconomyv4short
[DOLAP2019] Augmented Business Intelligence
[ADBIS 2021] - Optimizing Execution Plans in a Multistore
[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...

What's hot (19)

PPTX
Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...
PDF
A Unified Semantic Engine for Internet of Things and Smart Cities: From Senso...
PDF
A distributional structured semantic space for querying rdf graph data
PDF
Simulation based Performance Analysis of Histogram Shifting Method on Various...
PDF
Semantic IoT Semantic Inter-Operability Practices - Part 1
PPT
Discovering Things and Things’ data/services
PDF
Technology analysis for internet of things using big data learning
PDF
International Journal of Engineering Inventions (IJEI),
PPT
Big data and SP Theory of Intelligence
PDF
Volume 2-issue-6-1930-1932
PPT
Semantic technologies for the Internet of Things
PDF
CONTENT RECOVERY AND IMAGE RETRIVAL IN IMAGE DATABASE CONTENT RETRIVING IN TE...
PDF
Olive_JbCV[9mar_2k16]
PDF
AAAI 2016 - A Visual Semantic Framework For Innovation Analytics
PPTX
The deep learning technology on coco framework
PPT
IoT-Lite: A Lightweight Semantic Model for the Internet of Things
PDF
dagrep_v006_i004_p057_s16152
PDF
An Architectural Approach of Data Hiding In Images Using Mobile Communication
PDF
encryption based lsb steganography technique for digital images and text data
Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...
A Unified Semantic Engine for Internet of Things and Smart Cities: From Senso...
A distributional structured semantic space for querying rdf graph data
Simulation based Performance Analysis of Histogram Shifting Method on Various...
Semantic IoT Semantic Inter-Operability Practices - Part 1
Discovering Things and Things’ data/services
Technology analysis for internet of things using big data learning
International Journal of Engineering Inventions (IJEI),
Big data and SP Theory of Intelligence
Volume 2-issue-6-1930-1932
Semantic technologies for the Internet of Things
CONTENT RECOVERY AND IMAGE RETRIVAL IN IMAGE DATABASE CONTENT RETRIVING IN TE...
Olive_JbCV[9mar_2k16]
AAAI 2016 - A Visual Semantic Framework For Innovation Analytics
The deep learning technology on coco framework
IoT-Lite: A Lightweight Semantic Model for the Internet of Things
dagrep_v006_i004_p057_s16152
An Architectural Approach of Data Hiding In Images Using Mobile Communication
encryption based lsb steganography technique for digital images and text data
Ad

Similar to BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data Analysis for Advanced Data Segmentations and Predictions (20)

PDF
Topological Data Analysis With Applications Carlsson Gunnar Vejdemojohansson
PDF
TDA for feature selection
PDF
CCS2019-opological time-series analysis with delay-variant embedding
PPTX
NPS_TDA_forPDF_JPrendki
PDF
Snowbird comp-top-may2017
PDF
Introduction to Topological Data Analysis
PDF
Topological Data Analysis and Persistent Homology
PDF
Topological Data Analysis of Complex Spatial Systems
PDF
Machine Learning The Key Ingredient to Self-Driving Data Center
PPTX
A Study on data analysis in Oncology.pptx
PPTX
A Study on data analysis in Oncology.pptx
PDF
Topological Data Analysis of Complex Spatial Systems
PPTX
Ayasdi Energy Summit, September 2014, Gunnar Carlsson
PPTX
Shape as Organizing Principle for Data
PDF
Topological Data Analysis
PDF
SIAM-AG21-Topological Persistence Machine of Phase Transition
PDF
Van hulle springer:som
PDF
2012/2013-TDA-intro-part1
PPTX
Topology for Time Series.pptx
PDF
AI is Coming! Are You Ready? The story of “Self-Driving Datacenter”
Topological Data Analysis With Applications Carlsson Gunnar Vejdemojohansson
TDA for feature selection
CCS2019-opological time-series analysis with delay-variant embedding
NPS_TDA_forPDF_JPrendki
Snowbird comp-top-may2017
Introduction to Topological Data Analysis
Topological Data Analysis and Persistent Homology
Topological Data Analysis of Complex Spatial Systems
Machine Learning The Key Ingredient to Self-Driving Data Center
A Study on data analysis in Oncology.pptx
A Study on data analysis in Oncology.pptx
Topological Data Analysis of Complex Spatial Systems
Ayasdi Energy Summit, September 2014, Gunnar Carlsson
Shape as Organizing Principle for Data
Topological Data Analysis
SIAM-AG21-Topological Persistence Machine of Phase Transition
Van hulle springer:som
2012/2013-TDA-intro-part1
Topology for Time Series.pptx
AI is Coming! Are You Ready? The story of “Self-Driving Datacenter”
Ad

More from Big Data Week (20)

PPTX
BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...
PDF
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
PPTX
BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...
PPTX
BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...
PDF
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
PPTX
BDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of Data
PPTX
BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...
PPTX
BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...
PPTX
BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...
PDF
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
PDF
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
PDF
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
PDF
BDW16 London - Nondas Sourlas, Bupa - Big Data in Healthcare
PDF
BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...
PDF
BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...
PDF
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
PDF
BDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word Bingo
PDF
BDW16 London - Marius Boeru, Bigstep - How to Automate Big Data with Ansible
PPTX
BDW16 London - Josh Partridge, Shazam - How Labels, Radio Stations and Brand...
PDF
BDW16 London - Wael Elrifai, Pentaho - Big Data-Driven Innovatiom
BDWW17 London - Steve Bradbury, GRSC - Big Data to the Rescue: A Fraud Case S...
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Rita Simoes, Boehringer Ingelheim - Big Data in Pharma: Sittin...
BDW17 London - Mick Ridley, Exterion Media & Dale Campbell , TfL - Transformi...
BDW17 London - Abed Ajraou - First Utility - Putting Data Science in your Bus...
BDW17 London - Steve Bradbury - GRSC - Making Sense of the Chaos of Data
BDW17 London - Andy Boura - Thomson Reuters - Does Big Data Have to Mean Big ...
BDW17 London - Tom Woolrich, Financial Times - What Does Big Data Mean for th...
BDW17 London - Andrew Fryer, Microsoft - Everybody Needs a Bit of Science in ...
BDW16 London - Alex Bordei, Bigstep - Building Data Labs in the Cloud
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Nondas Sourlas, Bupa - Big Data in Healthcare
BDW16 London - John Callan, Boxever - Data and Analytics - The Fuel Your Bran...
BDW16 London - John Belchamber, Telefonica - New Data, New Strategies, New Op...
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
BDW16 London - Jonny Voon, Innovate UK - Smart Cities and the Buzz Word Bingo
BDW16 London - Marius Boeru, Bigstep - How to Automate Big Data with Ansible
BDW16 London - Josh Partridge, Shazam - How Labels, Radio Stations and Brand...
BDW16 London - Wael Elrifai, Pentaho - Big Data-Driven Innovatiom

Recently uploaded (20)

PPTX
Reuben-Fines-30-Rules-of-Chess-Mastering-the-Games-Timeless-Principles.pptx
PDF
WORLD Swim Coach Education with Argetina's Gustavo Roldan
DOCX
NFL Dublin Vikings’ Blueprint to Stop the Patriots’ Attack.docx
PPTX
India – The Diverse and Dynamic Country | TIDA Sports
PDF
Sports & Entertainment Streaming – Live Matches, Local Channels
DOCX
Football World Cup Tickets: Italy’s World Cup Home Kit Marks 20th Anniversary...
DOCX
MetLife Stadium Seeks Volunteers for FIFA 2026, Including the Final Match.docx
DOCX
Asia Cup 2025 A Painful News for India’s Star Wicket-Keeper.docx
PPTX
Presentación powerpoint ---deportes.pptx
DOCX
FIFA World Cup Tickets: Messi included in Argentina squad for FIFA 2026 CONME...
PPTX
Orange and Colorful History Of Sport Club Presentation.pptx
PDF
Transforming Capital into Catalysts – Capri’s Next Play in Sports.pdf
PPTX
CHESS final statement to encourage the knowledge about indoor games
DOCX
World Cup Here is the September schedule for the Lions of Teranga.docx
PDF
aasm 8/22-23 Schedule of Oral Presentation.pdf
DOCX
World Cup Tickets Uganda confirms stadium and date for Somalia fixture.docx
DOCX
NFL London Broncos Set Sights on 2025 Season.docx
DOCX
NFL Dublin Minnesota Vikings Bolster Backfield with New Running Back.docx
PDF
2025 AASM Schedule of Oral Presentation(0819)
PDF
Women Rugby World Cup 2025 Tickets: Ireland’s Road to Redemption, Squad Named...
Reuben-Fines-30-Rules-of-Chess-Mastering-the-Games-Timeless-Principles.pptx
WORLD Swim Coach Education with Argetina's Gustavo Roldan
NFL Dublin Vikings’ Blueprint to Stop the Patriots’ Attack.docx
India – The Diverse and Dynamic Country | TIDA Sports
Sports & Entertainment Streaming – Live Matches, Local Channels
Football World Cup Tickets: Italy’s World Cup Home Kit Marks 20th Anniversary...
MetLife Stadium Seeks Volunteers for FIFA 2026, Including the Final Match.docx
Asia Cup 2025 A Painful News for India’s Star Wicket-Keeper.docx
Presentación powerpoint ---deportes.pptx
FIFA World Cup Tickets: Messi included in Argentina squad for FIFA 2026 CONME...
Orange and Colorful History Of Sport Club Presentation.pptx
Transforming Capital into Catalysts – Capri’s Next Play in Sports.pdf
CHESS final statement to encourage the knowledge about indoor games
World Cup Here is the September schedule for the Lions of Teranga.docx
aasm 8/22-23 Schedule of Oral Presentation.pdf
World Cup Tickets Uganda confirms stadium and date for Somalia fixture.docx
NFL London Broncos Set Sights on 2025 Season.docx
NFL Dublin Minnesota Vikings Bolster Backfield with New Running Back.docx
2025 AASM Schedule of Oral Presentation(0819)
Women Rugby World Cup 2025 Tickets: Ireland’s Road to Redemption, Squad Named...

BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data Analysis for Advanced Data Segmentations and Predictions

  • 1. Deep Learning and Topological Data Analysis for Advanced Data Segmentation and Prediction Big Data Week, 2017 Edward Kibardin
  • 2. The vast majority of data is unstructured and unlabeled Medical and DNA profiling Images TextStock market transactions Customer Activities Sensor signals (IoT) System Logs Sound Unstructured Data
  • 3. The vast majority of data is unstructured and unlabeled Medical and DNA profiling Images TextStock market transactions Customer Activities Sensor signals (IoS) System Logs Sound Unstructured Data
  • 4. • How many categories in my dataset? • Which categories are the best for the business? • Why some objects are not like the others? • How I can contextualize new objects? • Is there a simpler way to describe my data? Unstructured Data Business question to unstructured and unlabeled data
  • 6. Text Analysis Legacy approach to text analysis
  • 7. Text Analysis Understand the language in maintenance job descriptions. Discover new job types, for better reporting and further investigations. • 365,000 maintenance job • Each description from 5 to 1000 words New challenges
  • 11. a. Compute a combinatorial model approximating the structure of the underlying space b. Then compute topological invariants of this structure c. Represent these topological invariants in 2d space Topological Data Analysis Pipeline a b c Reference: Teng Ma ; Zhuangzhi Wu ; Pei Luo ; Lu Feng. Reeb graph computation through spectral clustering, 2011.
  • 13. Sensor Signals (Internet of Things) Legacy approach to sensor data analysis
  • 14. New challenges Human Activity Recognition Using Smartphones Technical Research Centre for Dependency Care and Autonomous Living Universitat Politècnica de Catalunya (BarcelonaTech) Activities: Standing, sitting, laying, walking, walking upstairs, walking downstairs, stand-to-sit, sit-to-lay, lay-to-sit, sit-to-stand, stand-to-lay Sensor Signals (Internet of Things)
  • 17. Customer Activity Legacy approach to customer activity analysis
  • 18. Deep Generative Nets + TDA 1. Learning of deep generative model 2. Fine-tuning using topological loss
  • 20. Links Topology And Data (Gunnar Carlsson): http://www.ams.org/journals/bull/2009-46-02/S0273-0979-09-01249-X/S0273-0979-09- 01249-X.pdf Discrete Morse Theory and Persistent Homology (Kevin P. Knudson): http://www.math.fsu.edu/~hironaka/FSUUF/knudson.pdf Topological Persistence and Simplification (Herbert Edelsbrunner, David Letscher, Afra Zomorodian): http://math.uchicago.edu/~shmuel/AAT-readings/Data%20Analysis%20/PersTop.pdf Extracting and Composing Robust Features with Denoising Autoencoders (Pascal Vincent, Hugo Larochelle, Yoshua Bengio, Pierre-Antoine Manzagol) http://www.iro.umontreal.ca/~vincentp/Publications/denoising_autoencoders_tr1316.pdf
  • 21. More examples using TDA and Deep Learning
  • 24. 20 News groups text dataset Motorcycles Christian Atheism Religion.misc Politics.guns Politics.misc Politics.mideast Scy.crypt Scy.med Hockey Baseball Autos ForsaleMac.hardware Electronics Scy.space Comp.graphics Windows.x Ms-windows.misc Pc.hardware

Editor's Notes

  • #3: Our data storages are full of row unlabeled data
  • #4: Our data storages are full of row unlabeled data
  • #6: Our data storages are full of row unlabeled data
  • #7: Our data storages are full of row unlabeled data
  • #12: 0-dimentional simplexes 1-dimentional simplexes 2-dimentional simplexes
  • #13: Our data storages are full of row unlabeled data
  • #14: Our data storages are full of row unlabeled data
  • #15: Our data storages are full of row unlabeled data
  • #16: Our data storages are full of row unlabeled data
  • #17: Our data storages are full of row unlabeled data
  • #18: Our data storages are full of row unlabeled data
  • #20: Our data storages are full of row unlabeled data