SlideShare a Scribd company logo
Qrious about Insights
Big Data in the Real World
AUT DSRG Workshop
Guy Kloss
guy.kloss@qrious.co.nz
Enterprise Architect
Qrious Limited
7 February 2017
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Outline
1 The Problem
2 Examples
3 The Solution
4 Tools of the Trade
5 Boxing up a Solution
6 Flotsam and Jetsam
Guy Kloss | Big Data in the Real World 2/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Who/What is Qrious?
We help New Zealand businesses
and public sector organisations
create value
and solve their most pressing business problems
by turning data into actionable insight.
Guy Kloss | Big Data in the Real World 3/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Who/What is Qrious?
Backed by Spark
Approx. 60 employees
Offices in Auckland & Wellington
Substantial investment across Data, Platform & People
Built from the ground up
(new generation technology and working principles)
One of the largest Data Science teams in the country
with > 80% qualified to Masters & PhD level
and over 60 years of combined experience years of combined experience
NZs leading data analytics specialist by 2017
Guy Kloss | Big Data in the Real World 4/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Our Capabilities
Advanced analytics
Location insights
Big Data platforms
Consulting services
BI & Warehousing
Guy Kloss | Big Data in the Real World 5/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Who am I?
Chemical Engineer (Masters)
Rocket Scientist (German Aerospace Centre)
Computer Scientist (PhD)
Former lecturer (AUT)
Lead Software Developer and Head Crypto Geek @ Mega
Enterprise Architect at Qrious
Dad, baseballer, diver, . . . general geek!
Guy Kloss | Big Data in the Real World 6/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Outline
1 The Problem
2 Examples
3 The Solution
4 Tools of the Trade
5 Boxing up a Solution
6 Flotsam and Jetsam
Guy Kloss | Big Data in the Real World 7/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Data size
Number of records
Data volume
Guy Kloss | Big Data in the Real World 8/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
An exponentially growing data world
Primary Memory/Disk Capacity
Guy Kloss | Big Data in the Real World 9/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
An exponentially growing data world
Relative Speeds
Source: http://www.cs.cmu.edu/~amarp/cpu-io-gap
Guy Kloss | Big Data in the Real World 10/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Size Does Matter!
Access/processing beyond a single machine
(RAM, disk, CPU)
Expensive data transfers at volume
(latency, throughput)
Guy Kloss | Big Data in the Real World 11/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Storage Issues
Storage, access, index, find
Transfer, manage, prevent data loss
Guy Kloss | Big Data in the Real World 12/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Types of Data
Structured
Unstructured
Graphs
Free text
. . .
Guy Kloss | Big Data in the Real World 13/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Correlating . . . co-relating . . . mashing . . .
Not single record problem
But an m : n problem
Guy Kloss | Big Data in the Real World 14/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Beyond Exponential
Problems are between exponential and hyperexponential
→ Enabling data processing in an exponential world
Guy Kloss | Big Data in the Real World 15/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Outline
1 The Problem
2 Examples
3 The Solution
4 Tools of the Trade
5 Boxing up a Solution
6 Flotsam and Jetsam
Guy Kloss | Big Data in the Real World 16/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Number of Records
> 1 trillion (109
) records: Spark’s location based data set
Anonymised for privacy (on ingest)
Fully encrypted (at rest and in transport)
Continuous/stream ingestion
Normalisation and segmentation on data set
Correlating with external data set
→ Finding insights in this “hay mountain”
Guy Kloss | Big Data in the Real World 17/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Data Volume
100s of TB to PB of “Data Lakes”
Not just a backup/data grave
Fully encrypted (at rest and in transport)
Includes data querying and processing capability
→ Capability to “store everything” (every thing and kind)
Guy Kloss | Big Data in the Real World 18/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Outline
1 The Problem
2 Examples
3 The Solution
4 Tools of the Trade
5 Boxing up a Solution
6 Flotsam and Jetsam
Guy Kloss | Big Data in the Real World 19/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Divide and Conquer
Massively parallel processing: MPP
Parallelise: Map-Reduce
Pipelines: Stream processing
Guy Kloss | Big Data in the Real World 20/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Leverage Data Locality
Bring processing to the data
Guy Kloss | Big Data in the Real World 21/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
The Right Tools
Don’t re-invent the wheel
Use existing high performing tools where possible
Available high productivity frameworks, making use of high level languages
The right tool for the type of data
Use the Source, Luke!
(Leverage open source based tooling with a community)
Guy Kloss | Big Data in the Real World 22/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
The Right Data Organisation
Row vs. columnar storage
→ For analytics often better in columnar format
Guy Kloss | Big Data in the Real World 23/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
In, Out, Cha-Cha-Cha
Ingest data from (legacy, external) source systems
→ ETL – Extract, Transform, Load
Make sure the rhythm fits (no missing “Out”)
Guy Kloss | Big Data in the Real World 24/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Outline
1 The Problem
2 Examples
3 The Solution
4 Tools of the Trade
5 Boxing up a Solution
6 Flotsam and Jetsam
Guy Kloss | Big Data in the Real World 25/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Hadoop
Hadoop and distributions
Processing tools for relational, streaming, batch, graph, text, search, . . .
Allocates cluster resources dynamically
Data distributed (with redundancy),
so compute allocated where data is
Guy Kloss | Big Data in the Real World 26/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Hadoop Distributions
Many Hadoop distributions: Similar to Linux distributions
Cloudera Partnership with Qrious
“Bronze” partner
Ambitions to become “Silver” partner
and MSP (managed service provider)
Guy Kloss | Big Data in the Real World 27/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Basic Hadoop Tool Suite
Example: Cloudera Hadoop Distribution
Guy Kloss | Big Data in the Real World 28/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
MPP Databases
DB for massively parallel processing (MPP)
Greenplum database and forks
(based on PostgreSQL)
Guy Kloss | Big Data in the Real World 29/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Generic and Specialised DBs
Generic RDBMS (where useful)
NoSQL
Graph DB
Other columnar species
Guy Kloss | Big Data in the Real World 30/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Outline
1 The Problem
2 Examples
3 The Solution
4 Tools of the Trade
5 Boxing up a Solution
6 Flotsam and Jetsam
Guy Kloss | Big Data in the Real World 31/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Delivering a Suitable Solution
Includes:
System management
Connectivity
Application logic
Services
Yummy add-ons
Guy Kloss | Big Data in the Real World 32/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
System Management Framework
Security
Dedicated sub-networks with specific firewall rules
External firewalls
User and credentials management
Log collector
Other security tools . . .
System access
VPN
Remote desktop services
Guy Kloss | Big Data in the Real World 33/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Connectivity
API gateways
(Reverse) proxies
SFTP
Guy Kloss | Big Data in the Real World 34/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Application Logic
Platfor-as-a-Service (PaaS)
Huge benefits of containerising application logic (using Docker)
→ Much reduced cadence for delivery
APIs, Micro-Services
Orchestration of Big Data analysis
Guy Kloss | Big Data in the Real World 35/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Services
Solutioning, build
Analytics and development
Operation and maintenance
Guy Kloss | Big Data in the Real World 36/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Bonus Points for . . .
Provenance
(reproducibility, auditability, compliance)
AI and ML
Blockchain
(non-repudiation, trust, “smart contracts”,
identity management, federation, . . . )
Guy Kloss | Big Data in the Real World 37/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Outline
1 The Problem
2 Examples
3 The Solution
4 Tools of the Trade
5 Boxing up a Solution
6 Flotsam and Jetsam
Guy Kloss | Big Data in the Real World 38/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
In the Qrious Pipeline
Make Big Data a commodity: Don’t buy, pay what you need!
→ Big-Data-as-a-Service – BDPaaS
Sliced, diced and configured to your needs
Straight on bare metal,
not VMs (like most cloud hosters)
Guy Kloss | Big Data in the Real World 39/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Maximising the Jobmarket
What skills do you need?
RDBMS?
SAS?
NoSQL DBs?
Maybe Hadoop is a good answer?
Guy Kloss | Big Data in the Real World 40/41
The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam
Questions?
Parallelise!
Guy Kloss
guy.kloss@qrious.co.nz
Just a humble hair–dryer from the 30s:
“One of the first machines used for
permanent wave hairstyling back in the
1920’s and 1930’s.”
Dark Roasted Blend:
http://www.darkroastedblend.com/2007/05/
mystery-devices-issue-2.html
Guy Kloss | Big Data in the Real World 41/41

More Related Content

PPTX
HadoopWorkshopJuly2014
PDF
The New Model
PDF
Building a (Really) Secure Cloud Product
PDF
WTF is Blockchain???
PDF
Bringing Big Data Analytics to Network Monitoring
PPTX
Digital Transformation and Data Protection in Automotive Industry
PDF
Smarter commerce overview
PDF
Smarter commerce partner presentation final
HadoopWorkshopJuly2014
The New Model
Building a (Really) Secure Cloud Product
WTF is Blockchain???
Bringing Big Data Analytics to Network Monitoring
Digital Transformation and Data Protection in Automotive Industry
Smarter commerce overview
Smarter commerce partner presentation final

Viewers also liked (12)

PDF
Case study - Automotive DMS Connection to Salesforce.com
PDF
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
PPT
Real-Time Analytics for Industries
PPTX
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
PDF
IBM Smarter Commerce - A Strategic Analysis
PPTX
IBM Smarter Commerce Order Management for Communications
PDF
Big Data Analytics - From Generating Big Data to Deriving Business Value
PDF
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
PDF
Driving digital transformation in Automotive industry
PPTX
Global Big Data Conference Hyderabad-2Aug2013- Finance/Manufacturing Use Cases
PDF
Big Data & Analytics in the Manufacturing Industry: The Vaasan Group
PDF
Digital Transformation in Automotive
Case study - Automotive DMS Connection to Salesforce.com
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Real-Time Analytics for Industries
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
IBM Smarter Commerce - A Strategic Analysis
IBM Smarter Commerce Order Management for Communications
Big Data Analytics - From Generating Big Data to Deriving Business Value
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Driving digital transformation in Automotive industry
Global Big Data Conference Hyderabad-2Aug2013- Finance/Manufacturing Use Cases
Big Data & Analytics in the Manufacturing Industry: The Vaasan Group
Digital Transformation in Automotive
Ad

Similar to Qrious about Insights -- Big Data in the Real World (20)

PDF
Big Data for One Big Family
PDF
How to build and run a big data platform in the 21st century
PDF
Decoding Data Science
PPTX
ATLUG Day of Champions
PPTX
Symposium 2018 - Big data transport and collaboration - Gregory Vial
PPT
Big Data
PPTX
Big Data and HR - Talk @SwissHR Congress
PDF
Big Data: an introduction
PDF
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
PDF
How it works- Data Science
DOCX
Bigdata notes
DOCX
Toward a System Building Agenda for Data Integration(and Dat.docx
PDF
From Science to Data: Following a principled path to Data Science
PDF
Designing the Next Generation Data Lake
PPTX
Hadoop for beginners free course ppt
PPTX
Big Data & Machine Learning - TDC2013 Sao Paulo
PPTX
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
PDF
How to Build Successful Data Team - Dataiku ?
PDF
02 a holistic approach to big data
PDF
Data Engineer's Lunch #85: Designing a Modern Data Stack
Big Data for One Big Family
How to build and run a big data platform in the 21st century
Decoding Data Science
ATLUG Day of Champions
Symposium 2018 - Big data transport and collaboration - Gregory Vial
Big Data
Big Data and HR - Talk @SwissHR Congress
Big Data: an introduction
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
How it works- Data Science
Bigdata notes
Toward a System Building Agenda for Data Integration(and Dat.docx
From Science to Data: Following a principled path to Data Science
Designing the Next Generation Data Lake
Hadoop for beginners free course ppt
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
How to Build Successful Data Team - Dataiku ?
02 a holistic approach to big data
Data Engineer's Lunch #85: Designing a Modern Data Stack
Ad

More from Guy K. Kloss (14)

PDF
Kauri ID - A Self-Sovereign, Blockchain-based Identity System
PDF
Representational State Transfer (REST) and HATEOAS
PDF
Introduction to LaTeX (For Word users)
PDF
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
PDF
Operations Research and Optimization in Python using PuLP
PDF
Python Data Plotting and Visualisation Extravaganza
PDF
Lecture "Open Source and Open Content"
PDF
Version Control with Subversion
PDF
Beating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
PDF
Thinking Hybrid - Python/C++ Integration
PDF
Thinking Hybrid - Python/C++ Integration
PDF
Gaining Colour Stability in Live Image Capturing
PDF
LaTeX Introduction for Word Users
PDF
Thinking Hybrid - Python/C++ Integration
Kauri ID - A Self-Sovereign, Blockchain-based Identity System
Representational State Transfer (REST) and HATEOAS
Introduction to LaTeX (For Word users)
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
Operations Research and Optimization in Python using PuLP
Python Data Plotting and Visualisation Extravaganza
Lecture "Open Source and Open Content"
Version Control with Subversion
Beating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
Thinking Hybrid - Python/C++ Integration
Thinking Hybrid - Python/C++ Integration
Gaining Colour Stability in Live Image Capturing
LaTeX Introduction for Word Users
Thinking Hybrid - Python/C++ Integration

Recently uploaded (20)

PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
KodekX | Application Modernization Development
PPTX
Spectroscopy.pptx food analysis technology
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Encapsulation theory and applications.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Spectral efficient network and resource selection model in 5G networks
Per capita expenditure prediction using model stacking based on satellite ima...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Agricultural_Statistics_at_a_Glance_2022_0.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Encapsulation_ Review paper, used for researhc scholars
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
“AI and Expert System Decision Support & Business Intelligence Systems”
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
KodekX | Application Modernization Development
Spectroscopy.pptx food analysis technology
NewMind AI Weekly Chronicles - August'25 Week I
MYSQL Presentation for SQL database connectivity
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Encapsulation theory and applications.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Approach and Philosophy of On baking technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf

Qrious about Insights -- Big Data in the Real World

  • 1. Qrious about Insights Big Data in the Real World AUT DSRG Workshop Guy Kloss guy.kloss@qrious.co.nz Enterprise Architect Qrious Limited 7 February 2017
  • 2. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Outline 1 The Problem 2 Examples 3 The Solution 4 Tools of the Trade 5 Boxing up a Solution 6 Flotsam and Jetsam Guy Kloss | Big Data in the Real World 2/41
  • 3. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Who/What is Qrious? We help New Zealand businesses and public sector organisations create value and solve their most pressing business problems by turning data into actionable insight. Guy Kloss | Big Data in the Real World 3/41
  • 4. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Who/What is Qrious? Backed by Spark Approx. 60 employees Offices in Auckland & Wellington Substantial investment across Data, Platform & People Built from the ground up (new generation technology and working principles) One of the largest Data Science teams in the country with > 80% qualified to Masters & PhD level and over 60 years of combined experience years of combined experience NZs leading data analytics specialist by 2017 Guy Kloss | Big Data in the Real World 4/41
  • 5. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Our Capabilities Advanced analytics Location insights Big Data platforms Consulting services BI & Warehousing Guy Kloss | Big Data in the Real World 5/41
  • 6. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Who am I? Chemical Engineer (Masters) Rocket Scientist (German Aerospace Centre) Computer Scientist (PhD) Former lecturer (AUT) Lead Software Developer and Head Crypto Geek @ Mega Enterprise Architect at Qrious Dad, baseballer, diver, . . . general geek! Guy Kloss | Big Data in the Real World 6/41
  • 7. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Outline 1 The Problem 2 Examples 3 The Solution 4 Tools of the Trade 5 Boxing up a Solution 6 Flotsam and Jetsam Guy Kloss | Big Data in the Real World 7/41
  • 8. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Data size Number of records Data volume Guy Kloss | Big Data in the Real World 8/41
  • 9. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam An exponentially growing data world Primary Memory/Disk Capacity Guy Kloss | Big Data in the Real World 9/41
  • 10. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam An exponentially growing data world Relative Speeds Source: http://www.cs.cmu.edu/~amarp/cpu-io-gap Guy Kloss | Big Data in the Real World 10/41
  • 11. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Size Does Matter! Access/processing beyond a single machine (RAM, disk, CPU) Expensive data transfers at volume (latency, throughput) Guy Kloss | Big Data in the Real World 11/41
  • 12. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Storage Issues Storage, access, index, find Transfer, manage, prevent data loss Guy Kloss | Big Data in the Real World 12/41
  • 13. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Types of Data Structured Unstructured Graphs Free text . . . Guy Kloss | Big Data in the Real World 13/41
  • 14. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Correlating . . . co-relating . . . mashing . . . Not single record problem But an m : n problem Guy Kloss | Big Data in the Real World 14/41
  • 15. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Beyond Exponential Problems are between exponential and hyperexponential → Enabling data processing in an exponential world Guy Kloss | Big Data in the Real World 15/41
  • 16. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Outline 1 The Problem 2 Examples 3 The Solution 4 Tools of the Trade 5 Boxing up a Solution 6 Flotsam and Jetsam Guy Kloss | Big Data in the Real World 16/41
  • 17. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Number of Records > 1 trillion (109 ) records: Spark’s location based data set Anonymised for privacy (on ingest) Fully encrypted (at rest and in transport) Continuous/stream ingestion Normalisation and segmentation on data set Correlating with external data set → Finding insights in this “hay mountain” Guy Kloss | Big Data in the Real World 17/41
  • 18. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Data Volume 100s of TB to PB of “Data Lakes” Not just a backup/data grave Fully encrypted (at rest and in transport) Includes data querying and processing capability → Capability to “store everything” (every thing and kind) Guy Kloss | Big Data in the Real World 18/41
  • 19. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Outline 1 The Problem 2 Examples 3 The Solution 4 Tools of the Trade 5 Boxing up a Solution 6 Flotsam and Jetsam Guy Kloss | Big Data in the Real World 19/41
  • 20. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Divide and Conquer Massively parallel processing: MPP Parallelise: Map-Reduce Pipelines: Stream processing Guy Kloss | Big Data in the Real World 20/41
  • 21. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Leverage Data Locality Bring processing to the data Guy Kloss | Big Data in the Real World 21/41
  • 22. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam The Right Tools Don’t re-invent the wheel Use existing high performing tools where possible Available high productivity frameworks, making use of high level languages The right tool for the type of data Use the Source, Luke! (Leverage open source based tooling with a community) Guy Kloss | Big Data in the Real World 22/41
  • 23. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam The Right Data Organisation Row vs. columnar storage → For analytics often better in columnar format Guy Kloss | Big Data in the Real World 23/41
  • 24. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam In, Out, Cha-Cha-Cha Ingest data from (legacy, external) source systems → ETL – Extract, Transform, Load Make sure the rhythm fits (no missing “Out”) Guy Kloss | Big Data in the Real World 24/41
  • 25. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Outline 1 The Problem 2 Examples 3 The Solution 4 Tools of the Trade 5 Boxing up a Solution 6 Flotsam and Jetsam Guy Kloss | Big Data in the Real World 25/41
  • 26. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Hadoop Hadoop and distributions Processing tools for relational, streaming, batch, graph, text, search, . . . Allocates cluster resources dynamically Data distributed (with redundancy), so compute allocated where data is Guy Kloss | Big Data in the Real World 26/41
  • 27. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Hadoop Distributions Many Hadoop distributions: Similar to Linux distributions Cloudera Partnership with Qrious “Bronze” partner Ambitions to become “Silver” partner and MSP (managed service provider) Guy Kloss | Big Data in the Real World 27/41
  • 28. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Basic Hadoop Tool Suite Example: Cloudera Hadoop Distribution Guy Kloss | Big Data in the Real World 28/41
  • 29. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam MPP Databases DB for massively parallel processing (MPP) Greenplum database and forks (based on PostgreSQL) Guy Kloss | Big Data in the Real World 29/41
  • 30. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Generic and Specialised DBs Generic RDBMS (where useful) NoSQL Graph DB Other columnar species Guy Kloss | Big Data in the Real World 30/41
  • 31. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Outline 1 The Problem 2 Examples 3 The Solution 4 Tools of the Trade 5 Boxing up a Solution 6 Flotsam and Jetsam Guy Kloss | Big Data in the Real World 31/41
  • 32. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Delivering a Suitable Solution Includes: System management Connectivity Application logic Services Yummy add-ons Guy Kloss | Big Data in the Real World 32/41
  • 33. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam System Management Framework Security Dedicated sub-networks with specific firewall rules External firewalls User and credentials management Log collector Other security tools . . . System access VPN Remote desktop services Guy Kloss | Big Data in the Real World 33/41
  • 34. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Connectivity API gateways (Reverse) proxies SFTP Guy Kloss | Big Data in the Real World 34/41
  • 35. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Application Logic Platfor-as-a-Service (PaaS) Huge benefits of containerising application logic (using Docker) → Much reduced cadence for delivery APIs, Micro-Services Orchestration of Big Data analysis Guy Kloss | Big Data in the Real World 35/41
  • 36. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Services Solutioning, build Analytics and development Operation and maintenance Guy Kloss | Big Data in the Real World 36/41
  • 37. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Bonus Points for . . . Provenance (reproducibility, auditability, compliance) AI and ML Blockchain (non-repudiation, trust, “smart contracts”, identity management, federation, . . . ) Guy Kloss | Big Data in the Real World 37/41
  • 38. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Outline 1 The Problem 2 Examples 3 The Solution 4 Tools of the Trade 5 Boxing up a Solution 6 Flotsam and Jetsam Guy Kloss | Big Data in the Real World 38/41
  • 39. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam In the Qrious Pipeline Make Big Data a commodity: Don’t buy, pay what you need! → Big-Data-as-a-Service – BDPaaS Sliced, diced and configured to your needs Straight on bare metal, not VMs (like most cloud hosters) Guy Kloss | Big Data in the Real World 39/41
  • 40. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Maximising the Jobmarket What skills do you need? RDBMS? SAS? NoSQL DBs? Maybe Hadoop is a good answer? Guy Kloss | Big Data in the Real World 40/41
  • 41. The Problem Examples The Solution Tools of the Trade Boxing up a Solution Flotsam and Jetsam Questions? Parallelise! Guy Kloss guy.kloss@qrious.co.nz Just a humble hair–dryer from the 30s: “One of the first machines used for permanent wave hairstyling back in the 1920’s and 1930’s.” Dark Roasted Blend: http://www.darkroastedblend.com/2007/05/ mystery-devices-issue-2.html Guy Kloss | Big Data in the Real World 41/41