SlideShare a Scribd company logo
Data science at the Edge
With NiFi, TensorFlow and a proper
cluster for good measure
Simon Elliston Ball
@sireb
Simon Elliston Ball
• Product Manager
• Data Scientist
• Elephant herder
• @sireb
Data gravity
588,000,000 km
• Size
• Distance
Other types of data gravity
• Compliance
• Legislation
• Political
• Paranoia
Photo: https://flic.kr/p/JvW7qh
Sampling vs Big Data: a quick history
• Before we had cloud, clusters and GPUs…
• MPP
• Super Computers
• Grids
• Cut down data size to fit in memory
A quick intro to NiFi
• Guaranteed Delivery
• Prioritized queuing and buffering
• Data provenance
• Bi-directional communication
• Security – Authentication and multi-
role authorization
• Visual command and control
• Templating
• Robust API
and lots of adapters
Demo: sending stuff around
• Pushing camera frames to the cloud
Face detection
Key point locations
Lightweight models
Low contextual data
face detection
• Simple haarcascader in opencv:
https://github.com/simonellistonball/nifi-
OpenCV
Dlib Face
Detection
• 68 Facial Point
Model
• c. 100MB
Tensorflow in NiFi
• Our haarcascade was… Face
detection didn’t do a great job
• Neural Networks
• Relatively Large models
• Haarcascader: 677KB of XML
• Facenet trained model on LFW:
168 MB (and that’s zipped
protobufs)
• Tensorflow: https://github.com/tspannhw/nifi-tensorflow-processor.git
Face recognition
• Huge databases of face hashes and feature measures
• Extra information and context around the person
• Computationally expensive and heavy network use
• Apple Face ID demo… too many people had tried the
device beforehand, blew the database. One or two faces is
easy, millions is another matter
Rocket ship to the cloud
https://www.nasa.gov/sites/default/files/thumbnails/image/s83-35620-3k.jpg
Cloud: ML all packaged up… for a price
Tensorflow on Spark
• Why?
• Doesn’t TensorFlow already have a
distributed compute model?
Existing clusters, multi-purpose clusters:
• Tensorframes, TensorflowOnSpark,
CaffeOnSpark, Spark ML, SQL
• When?
• Training, batch scoring
Broadening the example
• Where is your context?
• Why do you need context?
• Detection
• Explanation
Body worn video
• Record everything
• Record when you remember to
press the button
• Record when it matters
What about?
• Live assist
• Evidence and accountability
Netflow
Cybersecurity: progressive context
• Record everything: PCAP
• Send up the (maybe) interesting bits
• Fetch detail on demand
PCAP at Edge
1ST Pass Model Security Data
Analytics Platform
adds context, more compute
intensive modelling etc
Hmmm… That’s interesting
Let me tell you more…
“small” data flow
ANPR: or why you can’t hide from
parking fines
Summary: progressive enhancement of
context
Is it worth processing? Rough-cut and hashing Expensive deep analysis
@sireb
677KB of local model O(100MB) models Cloud scale models and data
name
Simon Elliston Ball
cognitive.face.emotion
surprise
cognitive.face.exposure
overExposure
cognitive.face.noise
high
Thank you!
@sireb

More Related Content

PPTX
A streaming architecture for Cyber Security - Apache Metron
PPTX
Solving Cyber at Scale
PPTX
Apache Metron: Community Driven Cyber Security
PPTX
Cyber Attacks Spatial Analysis
PPTX
Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...
PDF
Get full visibility and find hidden security issues
PDF
Deep Learning in Security—An Empirical Example in User and Entity Behavior An...
PPTX
Building a future-proof cyber security platform with Apache Metron
A streaming architecture for Cyber Security - Apache Metron
Solving Cyber at Scale
Apache Metron: Community Driven Cyber Security
Cyber Attacks Spatial Analysis
Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...
Get full visibility and find hidden security issues
Deep Learning in Security—An Empirical Example in User and Entity Behavior An...
Building a future-proof cyber security platform with Apache Metron

What's hot (20)

PPTX
The Life of an Internet of Things Electron
PDF
Empower your security practitioners with the Elastic Stack
PDF
Cybersecurity with Apache Metron and Apache Solr - Ward Bekker, Hortonworks &...
PDF
October 2014 Webinar: Cybersecurity Threat Detection
PPTX
Security From The Big Data and Analytics Perspective
PDF
Log Monitoring and Anomaly Detection at Scale at ORNL
PPTX
Just the sketch: advanced streaming analytics in Apache Metron
PDF
Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...
PDF
Q radar architecture deep dive
PPTX
Cisco OpenSOC
PDF
Combining Logs, Metrics, and Traces for Unified Observability
PDF
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
PDF
Splunking configfiles 20211208_daniel_wilson
PPTX
Data Automation at Light Sources
PDF
Search for all with Elastic Enterprise Search
PDF
Elastic Security : Protéger son entreprise avec la Suite Elastic
PDF
Threat Hunting with Elastic at SpectorOps: Welcome to HELK
PPT
VeriSign iDefense Security Intelligence Services
PPTX
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
PDF
User Focused Security at Netflix: Stethoscope
The Life of an Internet of Things Electron
Empower your security practitioners with the Elastic Stack
Cybersecurity with Apache Metron and Apache Solr - Ward Bekker, Hortonworks &...
October 2014 Webinar: Cybersecurity Threat Detection
Security From The Big Data and Analytics Perspective
Log Monitoring and Anomaly Detection at Scale at ORNL
Just the sketch: advanced streaming analytics in Apache Metron
Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...
Q radar architecture deep dive
Cisco OpenSOC
Combining Logs, Metrics, and Traces for Unified Observability
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Splunking configfiles 20211208_daniel_wilson
Data Automation at Light Sources
Search for all with Elastic Enterprise Search
Elastic Security : Protéger son entreprise avec la Suite Elastic
Threat Hunting with Elastic at SpectorOps: Welcome to HELK
VeriSign iDefense Security Intelligence Services
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
User Focused Security at Netflix: Stethoscope
Ad

Similar to mcubed london - data science at the edge (20)

PPTX
Amazon Deep Learning
PDF
Large Scale Image Forensics using Tika and Tensorflow [ICMR MFSec 2017]
PPTX
Democratizing AI with Apache Spark
PDF
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
PDF
Supercharged graph visualization for cyber security
PDF
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
PDF
Big Data made easy in the era of the Cloud - Demi Ben-Ari
PDF
Ncku csie talk about Spark
PDF
Atom: A cloud native deep learning platform at Supremind
PDF
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
PPTX
The Background Noise of the Internet
PPTX
Isse 2014 homomorphic encryption and porticor post event
PDF
tHE GENERATION AND USE OF TLS FINGERPRINGTS
PDF
Cloud: From Unmanned Data Center to Algorithmic Economy using Openstack
PDF
Searching Chinese Patents Presentation at Enterprise Data World
PPTX
IoT - Life at the Edge
PDF
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
PDF
Learning the basics of Apache NiFi for iot OSS Europe 2020
PDF
Strata parallel m-ml-ops_sept_2017
PDF
Big Data : Bits of History, Words of Advice
Amazon Deep Learning
Large Scale Image Forensics using Tika and Tensorflow [ICMR MFSec 2017]
Democratizing AI with Apache Spark
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
Supercharged graph visualization for cyber security
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Ncku csie talk about Spark
Atom: A cloud native deep learning platform at Supremind
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
The Background Noise of the Internet
Isse 2014 homomorphic encryption and porticor post event
tHE GENERATION AND USE OF TLS FINGERPRINGTS
Cloud: From Unmanned Data Center to Algorithmic Economy using Openstack
Searching Chinese Patents Presentation at Enterprise Data World
IoT - Life at the Edge
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
Learning the basics of Apache NiFi for iot OSS Europe 2020
Strata parallel m-ml-ops_sept_2017
Big Data : Bits of History, Words of Advice
Ad

More from Simon Elliston Ball (11)

PPTX
When to no sql and when to know sql javaone
PPTX
Machine learning without the PhD - azure ml
PPTX
Why Hadoop and SQL just want to be friends - lightning talk NoSQL Matters Dub...
PPTX
Getting your Big Data on with HDInsight
PPT
Riding the Elephant - Hadoop 2.0
PPT
Riding the Elephant - Hadoop 2.0
PDF
Finding and Using Big Data in your business
PDF
When to NoSQL and when to know SQL
PDF
Mongo db for c# developers
PDF
NDC London 2013 - Mongo db for c# developers
PDF
Mongo db for C# Developers
When to no sql and when to know sql javaone
Machine learning without the PhD - azure ml
Why Hadoop and SQL just want to be friends - lightning talk NoSQL Matters Dub...
Getting your Big Data on with HDInsight
Riding the Elephant - Hadoop 2.0
Riding the Elephant - Hadoop 2.0
Finding and Using Big Data in your business
When to NoSQL and when to know SQL
Mongo db for c# developers
NDC London 2013 - Mongo db for c# developers
Mongo db for C# Developers

Recently uploaded (20)

PPTX
Managing Community Partner Relationships
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
Lecture1 pattern recognition............
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPT
Quality review (1)_presentation of this 21
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
Business Analytics and business intelligence.pdf
PDF
Introduction to Data Science and Data Analysis
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
modul_python (1).pptx for professional and student
Managing Community Partner Relationships
climate analysis of Dhaka ,Banglades.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
IB Computer Science - Internal Assessment.pptx
Lecture1 pattern recognition............
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Optimise Shopper Experiences with a Strong Data Estate.pdf
SAP 2 completion done . PRESENTATION.pptx
Quality review (1)_presentation of this 21
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Galatica Smart Energy Infrastructure Startup Pitch Deck
Miokarditis (Inflamasi pada Otot Jantung)
STERILIZATION AND DISINFECTION-1.ppthhhbx
Business Analytics and business intelligence.pdf
Introduction to Data Science and Data Analysis
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
modul_python (1).pptx for professional and student

mcubed london - data science at the edge

  • 1. Data science at the Edge With NiFi, TensorFlow and a proper cluster for good measure Simon Elliston Ball @sireb
  • 2. Simon Elliston Ball • Product Manager • Data Scientist • Elephant herder • @sireb
  • 4. Other types of data gravity • Compliance • Legislation • Political • Paranoia Photo: https://flic.kr/p/JvW7qh
  • 5. Sampling vs Big Data: a quick history • Before we had cloud, clusters and GPUs… • MPP • Super Computers • Grids • Cut down data size to fit in memory
  • 6. A quick intro to NiFi • Guaranteed Delivery • Prioritized queuing and buffering • Data provenance • Bi-directional communication • Security – Authentication and multi- role authorization • Visual command and control • Templating • Robust API
  • 7. and lots of adapters
  • 8. Demo: sending stuff around • Pushing camera frames to the cloud
  • 9. Face detection Key point locations Lightweight models Low contextual data
  • 10. face detection • Simple haarcascader in opencv: https://github.com/simonellistonball/nifi- OpenCV
  • 11. Dlib Face Detection • 68 Facial Point Model • c. 100MB
  • 12. Tensorflow in NiFi • Our haarcascade was… Face detection didn’t do a great job • Neural Networks • Relatively Large models • Haarcascader: 677KB of XML • Facenet trained model on LFW: 168 MB (and that’s zipped protobufs) • Tensorflow: https://github.com/tspannhw/nifi-tensorflow-processor.git
  • 13. Face recognition • Huge databases of face hashes and feature measures • Extra information and context around the person • Computationally expensive and heavy network use • Apple Face ID demo… too many people had tried the device beforehand, blew the database. One or two faces is easy, millions is another matter
  • 14. Rocket ship to the cloud https://www.nasa.gov/sites/default/files/thumbnails/image/s83-35620-3k.jpg
  • 15. Cloud: ML all packaged up… for a price
  • 16. Tensorflow on Spark • Why? • Doesn’t TensorFlow already have a distributed compute model? Existing clusters, multi-purpose clusters: • Tensorframes, TensorflowOnSpark, CaffeOnSpark, Spark ML, SQL • When? • Training, batch scoring
  • 17. Broadening the example • Where is your context? • Why do you need context? • Detection • Explanation
  • 18. Body worn video • Record everything • Record when you remember to press the button • Record when it matters What about? • Live assist • Evidence and accountability
  • 19. Netflow Cybersecurity: progressive context • Record everything: PCAP • Send up the (maybe) interesting bits • Fetch detail on demand PCAP at Edge 1ST Pass Model Security Data Analytics Platform adds context, more compute intensive modelling etc Hmmm… That’s interesting Let me tell you more… “small” data flow
  • 20. ANPR: or why you can’t hide from parking fines
  • 21. Summary: progressive enhancement of context Is it worth processing? Rough-cut and hashing Expensive deep analysis @sireb 677KB of local model O(100MB) models Cloud scale models and data name Simon Elliston Ball cognitive.face.emotion surprise cognitive.face.exposure overExposure cognitive.face.noise high

Editor's Notes

  • #13: And what do we get for our 180x increase in model size?