SlideShare a Scribd company logo
Open Source for Customer
Analytics
Matthias Funke
Business & Technology Consultant
Agenda Topics
Open Source Software
Data Products
The “Data Process”
Tying it together
Open Source Software
Examples: Linux, LibreOffice, Eclipse, Hadoop
Source Code open, e.g. github.com (>3M users, 6.8M repos)
Governed by foundations, e.g. Apache Software Foundation,
Free Software Foundation
Contributors / committers: Academia, start-ups, corporations,
specialised OSS companies
Popular Apache Software
Projects
Project Donated by...
Cassandra Facebook (2008)
Storm Twitter (2013)
Hadoop Yahoo (2008)
Kafka LinkedIn
Apache Software Foundation
Sponsors
Google, Yahoo, Microsoft, Facebook, Citrix…
HP, IBM, Hortonworks, Cloudera, Comcast
Auto & General, Huawei, Pivotal, …
Talend, Twitter
Benefits, Drawbacks & Facts
Benefits
● No Licence Cost
● Huge amount of
knowledge in the
community
● High speed of innovation
● Funny names
Drawbacks
● Overwhelming choices
● Varying maturity
● Skills challenge (for
newer projects)
Facts of Life
● Professional Services / Support not free
“Data Products”
Core: valuable data. Tools to display and manipulate.
Good: live, visual, searchable
Types:
● Exploratory
● Internal production
● Publicly facing (but free)
● Commercial = monetised
VOLUME
VARIETY
VELOCITY
VERACITY
Popular Data Products
Google Flights (not a booking engine!)
CIA World Fact Book (simple presentation)
Inside AirBnB (“activist”)
data.gov.uk
Open source for customer analytics
The Data Process
1. Obtain data
2. Explore & clean data
3. Analyse & model
4. Visualise
5. Productionise & automate Data Pipeline
a. How and where to distribute?
b. How to scale?
c. How to secure?
d. How to manage day-to-day?
Data Exploration on One PC
Using ggplot2 for exploratory graphs
qplot(host$availability_365,
+ geom="histogram",
+ binwidth = 5,
+ main = "Histogram for Availability",
+ xlab = "AirBnB in London",
+ fill=I("blue"))
Statistical Analysis
SIMPLE
● Sum, Count, Mean / Median
● Variance / Standard Deviation
E.g. Average Revenue per User per
Neighbourhood (by Month of the
Year)
MORE COMPLEX
● Clustering
● Co-variance matrix
(dependencies between
variables)
● Predictive Models
● Machine Learning
Big Data Architectures (simplified)
“Big” Database Hadoop Cluster / File System
Query Engine (Data Access)
Execution Engine (Business Logic)
Search Engine (Accessibility)
Visualisation Layer
Visualisation using KIBANA
Trusted Analytics Platform - Brand New OSS
Interactive Notebooks
New breed of software to work interactively on data
Spark/Scala Notebook
Apache Zeppelin
Databricks: cloud (proprietary but built on Spark)

More Related Content

PPT
Hack reduce introduction
PDF
Big data – An Introduction, July 2013
PPTX
Big data hadoop
PPTX
Gail Zhou on "Big Data Technology, Strategy, and Applications"
PPTX
Big Data - HDInsight and Power BI
PPTX
Introduction to hadoop
PDF
From BigTable to HBase and back again
PDF
Analysis of big data in pandemic case
Hack reduce introduction
Big data – An Introduction, July 2013
Big data hadoop
Gail Zhou on "Big Data Technology, Strategy, and Applications"
Big Data - HDInsight and Power BI
Introduction to hadoop
From BigTable to HBase and back again
Analysis of big data in pandemic case

What's hot (20)

PDF
Future of Data - Big Data
PPTX
Big Data Visualisation with Hadoop and PowerPivot
PPT
Graph Database and Neo4j
PDF
Big data landscape
PDF
Présentation on radoop
PPTX
Hadoop Training Tutorial for Freshers
PPTX
A Brief History Of Data
PDF
Turnkey Multi-Region, Active-Active Session Stores with Steeltoe, Redis Enter...
PPTX
View on big data technologies
PPTX
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
PPT
Overview of Bigdata Analytics
PPTX
Seattle scalability meetup March 27,2013 intro slides
PPTX
Big data PPT
PPTX
Hadoop - An Introduction
PPTX
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
PPTX
Bigdata
PPSX
Big Data
PDF
Science and Research - a new experimental platform in Brazil
PPTX
Владимир Слободянюк «DWH & BigData – architecture approaches»
PPT
Big Data
Future of Data - Big Data
Big Data Visualisation with Hadoop and PowerPivot
Graph Database and Neo4j
Big data landscape
Présentation on radoop
Hadoop Training Tutorial for Freshers
A Brief History Of Data
Turnkey Multi-Region, Active-Active Session Stores with Steeltoe, Redis Enter...
View on big data technologies
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Overview of Bigdata Analytics
Seattle scalability meetup March 27,2013 intro slides
Big data PPT
Hadoop - An Introduction
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Bigdata
Big Data
Science and Research - a new experimental platform in Brazil
Владимир Слободянюк «DWH & BigData – architecture approaches»
Big Data
Ad

Similar to Open source for customer analytics (20)

PDF
Data Science: Harnessing Open Data for High Impact Solutions
PPTX
Open AI Tools for Data Analytics
PPTX
The Business Economics and Opportunity of Open Source Data Science
PPT
NTEN Webinar - Data Cleaning and Visualization Tools for Nonprofits
PPTX
Builiding analytical apps on Hadoop
PDF
7 ‘Hidden’ Sources of Big Data That You Have
PDF
BAR360 open data platform presentation at DAMA, Sydney
PPTX
Big data4businessusers
PPTX
Infographics and big data
PPT
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
PPT
Gartner peer forum sept 2011 orbitz
PDF
Big Data Use Cases – Hadoop, Spark and Flink Case Studies.pdf
PPTX
Linked Open Data
PPTX
Applying Big Data
PDF
Big data analytics with Apache Hadoop
PDF
EDF2013: Big Data Tutorial: Marko Grobelnik
PDF
Big Data & Analytics (Conceptual and Practical Introduction)
PPTX
Big Data: Beyond the "Bigness" and the Technology (webcast)
PPTX
Data sharing between private companies and research facilities
PDF
Data Scientist Toolbox
Data Science: Harnessing Open Data for High Impact Solutions
Open AI Tools for Data Analytics
The Business Economics and Opportunity of Open Source Data Science
NTEN Webinar - Data Cleaning and Visualization Tools for Nonprofits
Builiding analytical apps on Hadoop
7 ‘Hidden’ Sources of Big Data That You Have
BAR360 open data platform presentation at DAMA, Sydney
Big data4businessusers
Infographics and big data
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Gartner peer forum sept 2011 orbitz
Big Data Use Cases – Hadoop, Spark and Flink Case Studies.pdf
Linked Open Data
Applying Big Data
Big data analytics with Apache Hadoop
EDF2013: Big Data Tutorial: Marko Grobelnik
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data: Beyond the "Bigness" and the Technology (webcast)
Data sharing between private companies and research facilities
Data Scientist Toolbox
Ad

Recently uploaded (20)

PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Digital Strategies for Manufacturing Companies
PDF
System and Network Administration Chapter 2
PDF
AI in Product Development-omnex systems
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
history of c programming in notes for students .pptx
PPTX
L1 - Introduction to python Backend.pptx
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Online Work Permit System for Fast Permit Processing
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPT
Introduction Database Management System for Course Database
PDF
Nekopoi APK 2025 free lastest update
PPTX
ai tools demonstartion for schools and inter college
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Digital Strategies for Manufacturing Companies
System and Network Administration Chapter 2
AI in Product Development-omnex systems
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Which alternative to Crystal Reports is best for small or large businesses.pdf
history of c programming in notes for students .pptx
L1 - Introduction to python Backend.pptx
PTS Company Brochure 2025 (1).pdf.......
VVF-Customer-Presentation2025-Ver1.9.pptx
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Online Work Permit System for Fast Permit Processing
Wondershare Filmora 15 Crack With Activation Key [2025
Introduction Database Management System for Course Database
Nekopoi APK 2025 free lastest update
ai tools demonstartion for schools and inter college
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Design an Analysis of Algorithms I-SECS-1021-03
Upgrade and Innovation Strategies for SAP ERP Customers

Open source for customer analytics

  • 1. Open Source for Customer Analytics Matthias Funke Business & Technology Consultant
  • 2. Agenda Topics Open Source Software Data Products The “Data Process” Tying it together
  • 3. Open Source Software Examples: Linux, LibreOffice, Eclipse, Hadoop Source Code open, e.g. github.com (>3M users, 6.8M repos) Governed by foundations, e.g. Apache Software Foundation, Free Software Foundation Contributors / committers: Academia, start-ups, corporations, specialised OSS companies
  • 4. Popular Apache Software Projects Project Donated by... Cassandra Facebook (2008) Storm Twitter (2013) Hadoop Yahoo (2008) Kafka LinkedIn
  • 5. Apache Software Foundation Sponsors Google, Yahoo, Microsoft, Facebook, Citrix… HP, IBM, Hortonworks, Cloudera, Comcast Auto & General, Huawei, Pivotal, … Talend, Twitter
  • 6. Benefits, Drawbacks & Facts Benefits ● No Licence Cost ● Huge amount of knowledge in the community ● High speed of innovation ● Funny names Drawbacks ● Overwhelming choices ● Varying maturity ● Skills challenge (for newer projects) Facts of Life ● Professional Services / Support not free
  • 7. “Data Products” Core: valuable data. Tools to display and manipulate. Good: live, visual, searchable Types: ● Exploratory ● Internal production ● Publicly facing (but free) ● Commercial = monetised VOLUME VARIETY VELOCITY VERACITY
  • 8. Popular Data Products Google Flights (not a booking engine!) CIA World Fact Book (simple presentation) Inside AirBnB (“activist”) data.gov.uk
  • 10. The Data Process 1. Obtain data 2. Explore & clean data 3. Analyse & model 4. Visualise 5. Productionise & automate Data Pipeline a. How and where to distribute? b. How to scale? c. How to secure? d. How to manage day-to-day?
  • 12. Using ggplot2 for exploratory graphs qplot(host$availability_365, + geom="histogram", + binwidth = 5, + main = "Histogram for Availability", + xlab = "AirBnB in London", + fill=I("blue"))
  • 13. Statistical Analysis SIMPLE ● Sum, Count, Mean / Median ● Variance / Standard Deviation E.g. Average Revenue per User per Neighbourhood (by Month of the Year) MORE COMPLEX ● Clustering ● Co-variance matrix (dependencies between variables) ● Predictive Models ● Machine Learning
  • 14. Big Data Architectures (simplified) “Big” Database Hadoop Cluster / File System Query Engine (Data Access) Execution Engine (Business Logic) Search Engine (Accessibility) Visualisation Layer
  • 16. Trusted Analytics Platform - Brand New OSS
  • 17. Interactive Notebooks New breed of software to work interactively on data Spark/Scala Notebook Apache Zeppelin Databricks: cloud (proprietary but built on Spark)