SlideShare a Scribd company logo
Data @Altocloud
Maciej Dabrowski, Chief Data Scientist
HUG 05/2017 Dublin
1
Modern Customer Engagement
• SMS
• Web Chat
• FB Messenger, Twitter DM
• Offers & Surveys
• Scheduled Callbacks
• Customer Context
• Behaviour Analytics
• Call Attribution to Campaigns
• Predictive Models
• Voice Calls
• Video
• Screen-share
Customer Journey Analytics
Connect the dots with live analytics and
AI to discover, analyse and predict
customer behaviour patterns.
Digital Messaging
Connect with customers by having live
web chat or SMS conversations, sending
targeted messages and offers.
Real Time Communications
Connect in real time using voice, video
and screensharing to engage with
exceptional customer service.
• Engage at the best time
• Accelerate revenue conversion
• Improve Customer Experience
• Resolve issues quickly
• Reduce calls / workload
• Increase First Call Resolution
• Reduce bounce and abandons
How Companies Benefit
25 people
1 dragon
2 locations
8 nationalities
having fun
…and growing!
4
EVENT PROCESSORS
Altocloud Holistic Customer Journey
BATCH
MODEL
LEARNING
ENRICHMENT
MODEL
EVALUATION
STORAGE
QUEUES
Web events
Call, IVR,Ticket events
ACTIONS
Marketing
Automation
SEGMENTATION
CRM
Web
Hook
AGGREGATION
ACTIONS
CREATION
EVENT STREAMS
OUTCOME PROBABILITIES
REAL-TIME CUSTOMER JOURNEY
Holistic view of your customers
6
Focus on real-time analytics
Make predictions on live visitors in real-time (in seconds) by:
Ingesting customer actions (events) and context
Building predictive models
Actions offered to customers based on real-time predictions
7
DISCLAIMER: NO LIPSTICK
This is not a sales pitch
Learn from mistakes of others
Show what works and what not
8
Agenda
Engineering challenges
Data platform
AI platform and workloads
9
Engineering challenges
Product complexity
Communication platform
Data platform
Scale
Millions of events per day
Billions of events overall
Typically no stable
schemas
10
Real-time aspects
Response in second(s)
Streaming nature
Reliability
24/7 availability
Services go down
Servers disappear
ALTOCLOUD DATA PLATFORM
ALTOCLOUD PLATFORM
Altocloud Platform
11
APIs
MESSAGE QUEUES
DATA PROCESSORS
STORAGE
APIsAPIs
APIs
Tools that we use
Focus on open source (Apache)
12
Tools that we use - data
13
Why Spark
Fast for iterative algorithms (important for Machine Learning)
Good integration with other tools (Kafka and Cassandra)
One code base for streaming and batch processing
Easy to deploy and maintain
Growing ecosystem (SQL, MLlib, GraphX, …)
Large open-source community
14
Data source: Kafka
Pub-sub message broker
Fast: 100s MBs /s on a single broker
Scalable: partitioned data streams
Durable: messages persisted and replicated
Distributed: Strong durability and fault-tolerance
Downside: requires ZooKeeper
15
Scalable storage
Easy to setup
High availability - no master
Great performance
CQL - SQL like querying
Great support and bug-free drivers from Datastax
Key: Design your schema around queries;
16
Data
Demographic
device
location
organisation
contact details,
and more
JSON
17
Events:
page views
form fills
searches
purchases
IVR / telephony
custom events
…
MESSAGE QUEUES
DATA PROCESSORS
DATA INGESTION
QUERY LAYER
STORAGE LAYER
Altocloud Data Platform
18
PLATFORM APIs
DATA APIs
Goals for Analytics platform
Easy to scale
As real-time as possible
Performance vs. flexibility
~80% of queries known upfront
Limited resources
Low latency
19
Analytics
MESSAGE QUEUES
DATA PROCESSORS
QUERY
LAYER
STORAGE LAYER
20
APIs
EVENT STORAGE
EVENTS DIMENSIONS VIEWS
AGGREGATIONS
EVENTS EVENT METADATA
1 2
2
4
3 5 6
7
APIs
Summary
Materialise views for buckets every minute
Hourly roll ups on raw events
Some numbers:
1bn+ events / day on 8 cores (Spark)
Sub-second query time
Lessons learned:
Know your data partitioning
Idempotent design is key!
21
Outcome Probabilities
22
AI platform
Goal: predict probability of customer X achieving goal Y
Train Models per Outcome and Business (1000s)
Apply models per each event in real time (5s)
Flexibility to add new data features on demand
Different dataset sizes forcing different algorithms
23
Spark ML Pipeline
“Decode” Spark ML pipeline & stages
Combine feature & model pipelines per-outcome
“Compose” per-outcome pipeline in streaming
Apply different pipelines per event in streaming batch
Key takeaways
Streaming over batch - highly reactive, low latency
Design for idempotent processing: things will always fail
Open source is great (most of the time) and cheap
macdab@altocloud.com
25
Complex algorithms behind simple UX
26

More Related Content

PPTX
Bumble bee
PPTX
Big data at zulily
PPTX
NoSql - everything you need to know to start
PPTX
Event-Based Subscription with MongoDB
PPTX
SMX Advanced Seattle -- Structured Web of Data
PPTX
Data Technology Platform @ RueLaLa.com
PPTX
Snowplow: where we came from and where we are going - March 2016
PDF
Systems of Intelligence: The Biggest Change in Enterprise Applications in 50 ...
Bumble bee
Big data at zulily
NoSql - everything you need to know to start
Event-Based Subscription with MongoDB
SMX Advanced Seattle -- Structured Web of Data
Data Technology Platform @ RueLaLa.com
Snowplow: where we came from and where we are going - March 2016
Systems of Intelligence: The Biggest Change in Enterprise Applications in 50 ...

What's hot (11)

PDF
Overview of Composable SaaS Models
ODP
Kasabi Linked Data Marketplace
PDF
Aw some day_insider_ycan
PPTX
Web Analytics
PPTX
Neo4j GraphTalk Frankfurt - Identity und Access Management
PPT
Real World MongoDB: Use Cases from Financial Services by Daniel Roberts
PDF
Neo4j GraphTalk - How Graphs Revolutionize Identity & Access Management
PDF
4 Steps to Make Customer Data Actionable
PPTX
The role of Big Data and Modern Data Management in Driving a Customer 360 fro...
PDF
Big data solutions explained for marketeers & business executives
PDF
Using neo4j for enterprise metadata requirements
Overview of Composable SaaS Models
Kasabi Linked Data Marketplace
Aw some day_insider_ycan
Web Analytics
Neo4j GraphTalk Frankfurt - Identity und Access Management
Real World MongoDB: Use Cases from Financial Services by Daniel Roberts
Neo4j GraphTalk - How Graphs Revolutionize Identity & Access Management
4 Steps to Make Customer Data Actionable
The role of Big Data and Modern Data Management in Driving a Customer 360 fro...
Big data solutions explained for marketeers & business executives
Using neo4j for enterprise metadata requirements
Ad

Similar to 2017 05 Hadoop User Group Meetup Dublin (20)

PDF
Creating an Omnichannel Banking Experience with Machine Learning on Azure Dat...
PPTX
Workshop: Make the Most of Customer Data Platforms - David Raab
PDF
Data Analytics at Altocloud
PPTX
E-Commerce and In-Memory Computing: Crossing the Scalability Chasm
PDF
Using ML and Azure to improve Customer Lifetime Value
PPTX
Unlocking Operational Intelligence from the Data Lake
PPTX
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
PPTX
Ai big dataconference_ml_fastdata_vitalii bondarenko
PPTX
Vitalii Bondarenko "Machine Learning on Fast Data"
PPTX
Confluent_Banking_Usecases_Examples.pptx
PPTX
Unlock Data-driven Insights in Databricks Using Location Intelligence
PDF
Boston Data Engineering: Alphabet Soup with Composable Analytics
PPTX
Unlocking Operational Intelligence from the Data Lake
PPTX
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
PPTX
Overview Microsoft's ML & AI tools
PDF
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
PDF
Enterprise Architecture vs. Data Architecture
PPTX
Confluent:AWS - GameDay.pptx
PDF
Your Data, Your AI, And Controlling Your Future - Tim Hayden, BrainTrust Part...
PPT
Creating an Omnichannel Banking Experience with Machine Learning on Azure Dat...
Workshop: Make the Most of Customer Data Platforms - David Raab
Data Analytics at Altocloud
E-Commerce and In-Memory Computing: Crossing the Scalability Chasm
Using ML and Azure to improve Customer Lifetime Value
Unlocking Operational Intelligence from the Data Lake
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Ai big dataconference_ml_fastdata_vitalii bondarenko
Vitalii Bondarenko "Machine Learning on Fast Data"
Confluent_Banking_Usecases_Examples.pptx
Unlock Data-driven Insights in Databricks Using Location Intelligence
Boston Data Engineering: Alphabet Soup with Composable Analytics
Unlocking Operational Intelligence from the Data Lake
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Overview Microsoft's ML & AI tools
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
Enterprise Architecture vs. Data Architecture
Confluent:AWS - GameDay.pptx
Your Data, Your AI, And Controlling Your Future - Tim Hayden, BrainTrust Part...
Ad

More from mdabrowski (11)

PDF
Spark Summit Europe 2017 - Applying multiple ML pipelines to heterogenous dat...
PPTX
The true meaning of data
PDF
Near real-time recommendations in enterprise social networks
PDF
Applications of the Social Semantic Web
PDF
Short guide to the Semantic Web
PDF
Introduction to the Social Semantic Web
PDF
Introduction to the Social Web and its applications
PPT
Geo-annotations in Semantic Digital Libraries
PPT
MarcOnt Initiative - Protege meeting
PPT
Philosophy and Atrificial Inteligence
PPT
MarcOnt Initiative
Spark Summit Europe 2017 - Applying multiple ML pipelines to heterogenous dat...
The true meaning of data
Near real-time recommendations in enterprise social networks
Applications of the Social Semantic Web
Short guide to the Semantic Web
Introduction to the Social Semantic Web
Introduction to the Social Web and its applications
Geo-annotations in Semantic Digital Libraries
MarcOnt Initiative - Protege meeting
Philosophy and Atrificial Inteligence
MarcOnt Initiative

Recently uploaded (20)

PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPT
Quality review (1)_presentation of this 21
PDF
Fluorescence-microscope_Botany_detailed content
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Foundation of Data Science unit number two notes
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Introduction to machine learning and Linear Models
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Database Infoormation System (DBIS).pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Quality review (1)_presentation of this 21
Fluorescence-microscope_Botany_detailed content
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Foundation of Data Science unit number two notes
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
STUDY DESIGN details- Lt Col Maksud (21).pptx
.pdf is not working space design for the following data for the following dat...
Qualitative Qantitative and Mixed Methods.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Introduction to machine learning and Linear Models
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
1_Introduction to advance data techniques.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Database Infoormation System (DBIS).pptx
Mega Projects Data Mega Projects Data
oil_refinery_comprehensive_20250804084928 (1).pptx

2017 05 Hadoop User Group Meetup Dublin

  • 1. Data @Altocloud Maciej Dabrowski, Chief Data Scientist HUG 05/2017 Dublin 1
  • 2. Modern Customer Engagement • SMS • Web Chat • FB Messenger, Twitter DM • Offers & Surveys • Scheduled Callbacks • Customer Context • Behaviour Analytics • Call Attribution to Campaigns • Predictive Models • Voice Calls • Video • Screen-share Customer Journey Analytics Connect the dots with live analytics and AI to discover, analyse and predict customer behaviour patterns. Digital Messaging Connect with customers by having live web chat or SMS conversations, sending targeted messages and offers. Real Time Communications Connect in real time using voice, video and screensharing to engage with exceptional customer service.
  • 3. • Engage at the best time • Accelerate revenue conversion • Improve Customer Experience • Resolve issues quickly • Reduce calls / workload • Increase First Call Resolution • Reduce bounce and abandons How Companies Benefit
  • 4. 25 people 1 dragon 2 locations 8 nationalities having fun …and growing! 4
  • 5. EVENT PROCESSORS Altocloud Holistic Customer Journey BATCH MODEL LEARNING ENRICHMENT MODEL EVALUATION STORAGE QUEUES Web events Call, IVR,Ticket events ACTIONS Marketing Automation SEGMENTATION CRM Web Hook AGGREGATION ACTIONS CREATION EVENT STREAMS OUTCOME PROBABILITIES REAL-TIME CUSTOMER JOURNEY
  • 6. Holistic view of your customers 6
  • 7. Focus on real-time analytics Make predictions on live visitors in real-time (in seconds) by: Ingesting customer actions (events) and context Building predictive models Actions offered to customers based on real-time predictions 7
  • 8. DISCLAIMER: NO LIPSTICK This is not a sales pitch Learn from mistakes of others Show what works and what not 8
  • 10. Engineering challenges Product complexity Communication platform Data platform Scale Millions of events per day Billions of events overall Typically no stable schemas 10 Real-time aspects Response in second(s) Streaming nature Reliability 24/7 availability Services go down Servers disappear
  • 11. ALTOCLOUD DATA PLATFORM ALTOCLOUD PLATFORM Altocloud Platform 11 APIs MESSAGE QUEUES DATA PROCESSORS STORAGE APIsAPIs APIs
  • 12. Tools that we use Focus on open source (Apache) 12
  • 13. Tools that we use - data 13
  • 14. Why Spark Fast for iterative algorithms (important for Machine Learning) Good integration with other tools (Kafka and Cassandra) One code base for streaming and batch processing Easy to deploy and maintain Growing ecosystem (SQL, MLlib, GraphX, …) Large open-source community 14
  • 15. Data source: Kafka Pub-sub message broker Fast: 100s MBs /s on a single broker Scalable: partitioned data streams Durable: messages persisted and replicated Distributed: Strong durability and fault-tolerance Downside: requires ZooKeeper 15
  • 16. Scalable storage Easy to setup High availability - no master Great performance CQL - SQL like querying Great support and bug-free drivers from Datastax Key: Design your schema around queries; 16
  • 17. Data Demographic device location organisation contact details, and more JSON 17 Events: page views form fills searches purchases IVR / telephony custom events …
  • 18. MESSAGE QUEUES DATA PROCESSORS DATA INGESTION QUERY LAYER STORAGE LAYER Altocloud Data Platform 18 PLATFORM APIs DATA APIs
  • 19. Goals for Analytics platform Easy to scale As real-time as possible Performance vs. flexibility ~80% of queries known upfront Limited resources Low latency 19
  • 20. Analytics MESSAGE QUEUES DATA PROCESSORS QUERY LAYER STORAGE LAYER 20 APIs EVENT STORAGE EVENTS DIMENSIONS VIEWS AGGREGATIONS EVENTS EVENT METADATA 1 2 2 4 3 5 6 7 APIs
  • 21. Summary Materialise views for buckets every minute Hourly roll ups on raw events Some numbers: 1bn+ events / day on 8 cores (Spark) Sub-second query time Lessons learned: Know your data partitioning Idempotent design is key! 21
  • 23. AI platform Goal: predict probability of customer X achieving goal Y Train Models per Outcome and Business (1000s) Apply models per each event in real time (5s) Flexibility to add new data features on demand Different dataset sizes forcing different algorithms 23
  • 24. Spark ML Pipeline “Decode” Spark ML pipeline & stages Combine feature & model pipelines per-outcome “Compose” per-outcome pipeline in streaming Apply different pipelines per event in streaming batch
  • 25. Key takeaways Streaming over batch - highly reactive, low latency Design for idempotent processing: things will always fail Open source is great (most of the time) and cheap macdab@altocloud.com 25

Editor's Notes

  • #15: One code base for batch and streaming Richer API (e.g. window functions) HDFS is the only requirement (that is if you want to do checkpointing)
  • #16: Fast A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients. Scalable Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers Durable Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact. Distributed by Design Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.