SlideShare a Scribd company logo
Making Big Data work
Lewis Crawford
Principal Architect @ the DataShed
thedatashed.co.uk
Lewis@thedatashed.co.uk
© the DataShed Limited 2015
intro
Who am I?
• For the last 3 years, the DataShed has been providing consultancy services to a vast array
of large clients. Our primary focus is ensuring that technology and analytical strategies
are truly aligned so that businesses can leverage the latest and greatest in technology to
model, mine and describe their data asset.
• We were working with Big Data technology before the term was coined, we have
experience delivering analytical systems driven by Petabyte data sets, and have designed,
implemented and supported one of the largest real-time data integration and predictive
analytics platforms in the aviation world.
• Our model is based on using a small number of exceptionally highly skilled individuals to
deliver disruptive and innovative solutions in an agile and delivery-focused manner.
© the DataShed Limited 2015
So what is ‘Big Data’?
© the DataShed Limited 2015
Making big data work
Why do Big Data projects fail?
Too many people think that Big Data is:
“The belief that the more data you have, the more insights and
answers will rise automatically from the pool of ones and zeros.”
Gill Press, Forbes.com
© the DataShed Limited 2015
How to make Big Data work?
1. Understand your problem
2. Apply appropriate tools
3. Automate everything.
© the DataShed Limited 2015
Real-time data
© the DataShed Limited 2015
© the DataShed Limited 2015
Making big data work
© the DataShed Limited 2015
Continuous Integration Demo
© the DataShed Limited 2015
How to make Big Data work?
1. Understand your problem
2. Apply appropriate tools
3. Automate everything.
© the DataShed Limited 2015
Little Big Data
© the DataShed Limited 2015
A problem closer to home…
• Every business needs to understand:
• Their potential customers and market
• Current customers
• Their products and sales
• How and when they engage prospects and customers
• Analytics and data are expensive
• Many of the mandatory elements are very similar for everyone
• The DataShed is Analytics as a Service and Single Customer View as a
Service.
© the DataShed Limited 2015
The deduplication problem…
• SME has 250,000 customers (two systems of record)
• To identify duplicates brute force approach: 31,249,875,000
comparisons
• Building a system to process a minimum of 100 clients a day…
• 3.1 trillion records to compare using > 10 different algorithms
• Traditional scale up approach would be expensive, and makes large
assumptions around blocking and partitioning rules
• A small data problem but a big data solution?
Title First Name Surname Address 1 Address 2 Address 3
Dr R J Smith Two Oaks 112 Old St. County Durham
Mrs Robyn Smith 112 Old Street Durham DH1 5YJ
© the DataShed Limited 2015
© the DataShed Limited 2015
The Shed demo
© the DataShed Limited 2015
How to make Big Data work?
1. Understand your problem
2. Apply appropriate tools
3. Automate everything.
© the DataShed Limited 2015
How to make Big Data work?
1. Understand your problem
• ’Big Data’ challenges aren’t necessarily new, however much of the technology is
• Articulate and communicate – focus on distilling your problem down
• Incremental improvement not wholesale replacement
2. Apply appropriate tools
• Understand the economics as well as the technology
• New technologies need to be evaluated within the context of your problem scope
• New technologies are enablers not deliverables (#datalake)
• ’Big Data’ technology should be seen as complementary to existing technology
3. Automate everything
• Continuous integration to include all testing
• Containerise where possible
• Measure everything
© the DataShed Limited 2015
If you really want to get involved…
© the DataShed Limited 2015
Get your hands dirty
If you’re interested in learning more, we’ll be hosting a hands-on labs
event in the near future.
Send your details to:
Email: hello@thedatashed.co.uk
Twitter: @thedatashed
© the DataShed Limited 2015
Any questions?
© the DataShed Limited 2015
Lewis Crawford
Principal Architect @ the DataShed
thedatashed.co.uk
Lewis@thedatashed.co.uk

More Related Content

PDF
Modern Data Architecture
PPTX
Dell hans timmerman v1.1
PPTX
Embedded Analytics Expert Session Webinar
 
PDF
Pieter den Hamer Alliander
PDF
Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...
PDF
Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...
PPTX
Bde presentation dv
PPTX
Jeff Kelly, Wikibon Slides; Big Data Summit 2015
Modern Data Architecture
Dell hans timmerman v1.1
Embedded Analytics Expert Session Webinar
 
Pieter den Hamer Alliander
Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...
Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...
Bde presentation dv
Jeff Kelly, Wikibon Slides; Big Data Summit 2015

What's hot (20)

PDF
Agile Data Management with Enterprise Data Fabric (ASEAN)
PDF
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
PPTX
Predictive and Prescriptive Analytics Expert Session Webinar
 
PDF
Presentation by Kasper Kisjes (Rijkswaterstaat) and Christoph Balduck (Data T...
PPT
Data is the new oil
PPTX
Make data simple in the cognitive era
PDF
A Successful Data Strategy for Insurers in Volatile Times (EMEA)
PPTX
Every angle jacques adriaansen
PPTX
Journey to Cloud Analytics
PDF
Agile Data Management with Enterprise Data Fabric (Middle East)
PDF
Presentation by Cédric Charlier (Elia) at the Data Vault Modelling and Data G...
PPTX
Study: #Big Data in #Austria
PDF
Analyst Keynote: TDWI: Data Virtualization as a Data Management Strategy for ...
PDF
Making Big Data Work
PDF
How to Build Successful Data Team - Dataiku ?
PPTX
Eneco Ronald Root
PPTX
"Social innovation with (big) data" - Maurice Fransen, Analytics Lead Public ...
PDF
datavirtuality - Beyond the data lake
PDF
Solution Centric Architectural Presentation - Implementing a Logical Data War...
PPTX
Rocking the World of Big Data at Centrica
Agile Data Management with Enterprise Data Fabric (ASEAN)
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
Predictive and Prescriptive Analytics Expert Session Webinar
 
Presentation by Kasper Kisjes (Rijkswaterstaat) and Christoph Balduck (Data T...
Data is the new oil
Make data simple in the cognitive era
A Successful Data Strategy for Insurers in Volatile Times (EMEA)
Every angle jacques adriaansen
Journey to Cloud Analytics
Agile Data Management with Enterprise Data Fabric (Middle East)
Presentation by Cédric Charlier (Elia) at the Data Vault Modelling and Data G...
Study: #Big Data in #Austria
Analyst Keynote: TDWI: Data Virtualization as a Data Management Strategy for ...
Making Big Data Work
How to Build Successful Data Team - Dataiku ?
Eneco Ronald Root
"Social innovation with (big) data" - Maurice Fransen, Analytics Lead Public ...
datavirtuality - Beyond the data lake
Solution Centric Architectural Presentation - Implementing a Logical Data War...
Rocking the World of Big Data at Centrica
Ad

Viewers also liked (6)

PDF
Business unIntelligence, Chapter 5
PDF
Why Big Data Analytics Needs Business Intelligence Too
PDF
Business unIntelligence - a Whistle Stop Tour
PPTX
Etl elt simplified
PDF
How big data is transforming BI
PPT
Three signs your architecture is too small for big data. Camp IT December 2014
Business unIntelligence, Chapter 5
Why Big Data Analytics Needs Business Intelligence Too
Business unIntelligence - a Whistle Stop Tour
Etl elt simplified
How big data is transforming BI
Three signs your architecture is too small for big data. Camp IT December 2014
Ad

Similar to Making big data work (20)

PDF
Making Big Data Work
PPTX
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
PDF
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
PPT
Datapreneurs
PPTX
basic of data science and big data......
PPTX
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
PDF
Big Data at a Glance
PPTX
data science chapter-4,5,6
PPTX
What is big data and 5'v of big data....
PPTX
What is Big Data , 5'v of BIG DATA and Challenges
PDF
New Data Science Framework for Analysing and Mining Big Data - Charith Silva
PDF
Big Data in small words
KEY
Exploring Big Data value for your business
PDF
Why Big Data is Really about Small Data
PDF
Data Science at Scale - The DevOps Approach
PDF
Big dataplatform operationalstrategy
DOC
Complete-SRS.doc
PPTX
Unlocking value in your (big) data
PDF
QuickView #3 - Big Data
PDF
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
Making Big Data Work
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Datapreneurs
basic of data science and big data......
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
Big Data at a Glance
data science chapter-4,5,6
What is big data and 5'v of big data....
What is Big Data , 5'v of BIG DATA and Challenges
New Data Science Framework for Analysing and Mining Big Data - Charith Silva
Big Data in small words
Exploring Big Data value for your business
Why Big Data is Really about Small Data
Data Science at Scale - The DevOps Approach
Big dataplatform operationalstrategy
Complete-SRS.doc
Unlocking value in your (big) data
QuickView #3 - Big Data
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
cuic standard and advanced reporting.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
KodekX | Application Modernization Development
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Approach and Philosophy of On baking technology
PDF
Machine learning based COVID-19 study performance prediction
NewMind AI Weekly Chronicles - August'25 Week I
Diabetes mellitus diagnosis method based random forest with bat algorithm
Review of recent advances in non-invasive hemoglobin estimation
Per capita expenditure prediction using model stacking based on satellite ima...
The AUB Centre for AI in Media Proposal.docx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
cuic standard and advanced reporting.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
KodekX | Application Modernization Development
Mobile App Security Testing_ A Comprehensive Guide.pdf
Empathic Computing: Creating Shared Understanding
Spectral efficient network and resource selection model in 5G networks
Programs and apps: productivity, graphics, security and other tools
Network Security Unit 5.pdf for BCA BBA.
Approach and Philosophy of On baking technology
Machine learning based COVID-19 study performance prediction

Making big data work

  • 1. Making Big Data work Lewis Crawford Principal Architect @ the DataShed thedatashed.co.uk Lewis@thedatashed.co.uk © the DataShed Limited 2015
  • 3. Who am I? • For the last 3 years, the DataShed has been providing consultancy services to a vast array of large clients. Our primary focus is ensuring that technology and analytical strategies are truly aligned so that businesses can leverage the latest and greatest in technology to model, mine and describe their data asset. • We were working with Big Data technology before the term was coined, we have experience delivering analytical systems driven by Petabyte data sets, and have designed, implemented and supported one of the largest real-time data integration and predictive analytics platforms in the aviation world. • Our model is based on using a small number of exceptionally highly skilled individuals to deliver disruptive and innovative solutions in an agile and delivery-focused manner. © the DataShed Limited 2015
  • 4. So what is ‘Big Data’? © the DataShed Limited 2015
  • 6. Why do Big Data projects fail? Too many people think that Big Data is: “The belief that the more data you have, the more insights and answers will rise automatically from the pool of ones and zeros.” Gill Press, Forbes.com © the DataShed Limited 2015
  • 7. How to make Big Data work? 1. Understand your problem 2. Apply appropriate tools 3. Automate everything. © the DataShed Limited 2015
  • 8. Real-time data © the DataShed Limited 2015
  • 9. © the DataShed Limited 2015
  • 11. © the DataShed Limited 2015
  • 12. Continuous Integration Demo © the DataShed Limited 2015
  • 13. How to make Big Data work? 1. Understand your problem 2. Apply appropriate tools 3. Automate everything. © the DataShed Limited 2015
  • 14. Little Big Data © the DataShed Limited 2015
  • 15. A problem closer to home… • Every business needs to understand: • Their potential customers and market • Current customers • Their products and sales • How and when they engage prospects and customers • Analytics and data are expensive • Many of the mandatory elements are very similar for everyone • The DataShed is Analytics as a Service and Single Customer View as a Service. © the DataShed Limited 2015
  • 16. The deduplication problem… • SME has 250,000 customers (two systems of record) • To identify duplicates brute force approach: 31,249,875,000 comparisons • Building a system to process a minimum of 100 clients a day… • 3.1 trillion records to compare using > 10 different algorithms • Traditional scale up approach would be expensive, and makes large assumptions around blocking and partitioning rules • A small data problem but a big data solution? Title First Name Surname Address 1 Address 2 Address 3 Dr R J Smith Two Oaks 112 Old St. County Durham Mrs Robyn Smith 112 Old Street Durham DH1 5YJ © the DataShed Limited 2015
  • 17. © the DataShed Limited 2015
  • 18. The Shed demo © the DataShed Limited 2015
  • 19. How to make Big Data work? 1. Understand your problem 2. Apply appropriate tools 3. Automate everything. © the DataShed Limited 2015
  • 20. How to make Big Data work? 1. Understand your problem • ’Big Data’ challenges aren’t necessarily new, however much of the technology is • Articulate and communicate – focus on distilling your problem down • Incremental improvement not wholesale replacement 2. Apply appropriate tools • Understand the economics as well as the technology • New technologies need to be evaluated within the context of your problem scope • New technologies are enablers not deliverables (#datalake) • ’Big Data’ technology should be seen as complementary to existing technology 3. Automate everything • Continuous integration to include all testing • Containerise where possible • Measure everything © the DataShed Limited 2015
  • 21. If you really want to get involved… © the DataShed Limited 2015
  • 22. Get your hands dirty If you’re interested in learning more, we’ll be hosting a hands-on labs event in the near future. Send your details to: Email: hello@thedatashed.co.uk Twitter: @thedatashed © the DataShed Limited 2015
  • 23. Any questions? © the DataShed Limited 2015 Lewis Crawford Principal Architect @ the DataShed thedatashed.co.uk Lewis@thedatashed.co.uk

Editor's Notes

  • #7: http://www.forbes.com/sites/gilpress/2014/09/03/12-big-data-definitions-whats-yours/2/ I like the last two. #11 is a warning against blindly collecting more data for the sake of collecting more data (see NSA). #12 is an acknowledgment that storing data in “data silos” has been the key obstacle to getting the data to work for us, to improve our work and lives. It’s all about attitude, not technologies or quantities.