SlideShare a Scribd company logo
Data Science
a n u n e x p e c te d j o u r n e y t o t h e
Digital Enterp ris e
J a k e B o u m a A p r i l 2 0 1 7
What I’m about
“Open Data Science is not a single technology, but a revolution
within the data science community; an inclusive movement that makes
open source tools for data science - data, analytics and computation -
easily work together as a connected ecosystem.”
https://www.linkedin.com/in/jaketbouma
https://github.com/jaketbouma
Project Switzerland
( w e ’ r e n e u t r a l l i k e t h a t )
“To develop the business
intelligence engine that enables BLT
to aggregate, process, analyse and
prioritise large sets of customer data
to reveal patterns and trends for
improved customer engagement,
increased revenue and operating
efficiencies”
On your marks, get set, data science!?
CEO
Mark Levy
Data Scientist
Jake Bouma
CIO
Andrew Murray
The first two months
• Extensive meetings with Business Units to identify opportunities
• The data battle: building relationships and scraping sources together
• Hacking POCs out
<Some preliminary results that I couldn’t share publicly>
Value Demonstrated
“Data is the organization’s
most underutilized asset.”
-- Andrew Murray
Actually, this is a big data problem
To Hadoop or not to Hadoop
This thing is big.
Data science works better on more data.
The most interesting data is messy.
Speed is king.
Performant access to the most granular, rawest form of everything
Hadoop has taught us some important lessons
Stop trying to force one tool to do everything.
Scale out works, send the code to the data.
Relational databases are still quite nice.
DevOps for big data
DATA
COMPUTE
Build what you need, when you need it
Decouple data from applications
Always be ready to scale
6 Months later: State of the art BDP
* 3 Networks, > 10 different types of server, >20 running servers, cloud native
BDP cruising
“There is just too much going
on.”
-- Jake Bouma
Agile Methodologies to the Rescue!
Small ideas
Small parts of a big idea
Big Ideas
Fail Fast
Prioritize
and Track
Long Term
Objectives Timescale of quarters
Timescale of weeks
Building out a team
“Practical solutions to not very well
defined problems.”
-- Owen Zhang
Top Ranked Kagglers list on their profiles:
• Electronics Engineer
• Quantitative Researcher
• Accounting, Finance and Risk
• Computer Science
• Credit risk modelling
• Entrepreneur
• Actuarial science
• Business Analyst
• No Physicists!? 
Are we there yet?
“I think I missed a trick along
the way...”
-- Jake Bouma
Culture is the spice that makes it Digital
Agile Platform
BIG DATA TECHNOLOGY
DATA SCIENCE
OPENNESS
CULTURE
AGILE
Questions Action
FeedbackImprovement
DEVOPS
The working digital system of insight?
AI v1.0
Questions Action
FeedbackImprovement
https://www.linkedin.com/in/jaketbouma
https://github.com/jaketbouma
https://twitter.com/jaketbouma
Something to think about

More Related Content

PDF
What is a Data Scientist
PPTX
Idiots guide to setting up a data science team
PDF
Back to Square One: Building a Data Science Team from Scratch
PDF
The Big Data Dream Team
PDF
How to build a data science team 20115.03.13v6
PDF
Democratizing Big Data (Updated)
PPTX
Moving Data Science from an Event to A Program: Considerations in Creating Su...
PPTX
Democratizing Big Data
What is a Data Scientist
Idiots guide to setting up a data science team
Back to Square One: Building a Data Science Team from Scratch
The Big Data Dream Team
How to build a data science team 20115.03.13v6
Democratizing Big Data (Updated)
Moving Data Science from an Event to A Program: Considerations in Creating Su...
Democratizing Big Data

What's hot (20)

PPTX
Week3 day6slide
PDF
Data_Scientist_Position_Description
PDF
Building an Insight Machine - Strata DDBD 2015
PDF
Building Data Science Teams
 
PPTX
Managing Data Science | Lessons from the Field
PDF
Evaluation of big data analysis
PDF
Asking Why
PDF
Big data week London 2014 - Affectv
PDF
Data science hypes and reality
PDF
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
PDF
Pay no attention to the man behind the curtain - the unseen work behind data ...
PDF
Knowledge Graphs for a Connected World - AI, Deep & Machine Learning Meetup
PDF
How to understand trends in the data & software market
PDF
Data Culture Series - Keynote - 24th feb
PPTX
Building Data Science Teams: A Moneyball Approach
PDF
Summary of Insights Learned from the Data Science Program Team Training
PDF
2017 06-14-getting started with data science
PDF
Solve User Problems: Data Architecture for Humans
PPTX
Big Data: Big Deal or Buzzword
PDF
Assumptions about Data and Analysis: Briefing room webcast slides
Week3 day6slide
Data_Scientist_Position_Description
Building an Insight Machine - Strata DDBD 2015
Building Data Science Teams
 
Managing Data Science | Lessons from the Field
Evaluation of big data analysis
Asking Why
Big data week London 2014 - Affectv
Data science hypes and reality
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Pay no attention to the man behind the curtain - the unseen work behind data ...
Knowledge Graphs for a Connected World - AI, Deep & Machine Learning Meetup
How to understand trends in the data & software market
Data Culture Series - Keynote - 24th feb
Building Data Science Teams: A Moneyball Approach
Summary of Insights Learned from the Data Science Program Team Training
2017 06-14-getting started with data science
Solve User Problems: Data Architecture for Humans
Big Data: Big Deal or Buzzword
Assumptions about Data and Analysis: Briefing room webcast slides
Ad

Similar to Data Science towards the Digital Enterprise (20)

PDF
20151016 Data Science For Project Managers
PDF
The Right Data Warehouse: Automation Now, Business Value Thereafter
PPTX
Why Everything You Know About bigdata Is A Lie
PDF
Expert Big Data Tips
PDF
Advanced Analytics and Machine Learning with Data Virtualization
PPTX
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
PPTX
Big Data : From HindSight to Insight to Foresight
PPTX
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
PPTX
Big Data, NoSQL, NewSQL & The Future of Data Management
PPTX
Data Science Consulting at ThoughtWorks -- NYC Open Data Meetup
PPTX
Innovation med big data – chr. hansens erfaringer
PDF
From Rocket Science to Data Science
PDF
How Data Virtualization Puts Machine Learning into Production (APAC)
PDF
How to Prepare for a Career in Data Science
PPTX
Big data
KEY
Exploring Big Data value for your business
PDF
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
PPTX
Asking the Right Questions of Your Data
PPTX
7 Innovative Ways Project Cortex Delivers Business Value
PDF
Thinkful DC - Intro to Data Science
20151016 Data Science For Project Managers
The Right Data Warehouse: Automation Now, Business Value Thereafter
Why Everything You Know About bigdata Is A Lie
Expert Big Data Tips
Advanced Analytics and Machine Learning with Data Virtualization
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
Big Data : From HindSight to Insight to Foresight
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
Big Data, NoSQL, NewSQL & The Future of Data Management
Data Science Consulting at ThoughtWorks -- NYC Open Data Meetup
Innovation med big data – chr. hansens erfaringer
From Rocket Science to Data Science
How Data Virtualization Puts Machine Learning into Production (APAC)
How to Prepare for a Career in Data Science
Big data
Exploring Big Data value for your business
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Asking the Right Questions of Your Data
7 Innovative Ways Project Cortex Delivers Business Value
Thinkful DC - Intro to Data Science
Ad

Recently uploaded (20)

PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
Mushroom cultivation and it's methods.pdf
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
A novel scalable deep ensemble learning framework for big data classification...
PPTX
A Presentation on Artificial Intelligence
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
A Presentation on Touch Screen Technology
PPTX
Chapter 5: Probability Theory and Statistics
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
1 - Historical Antecedents, Social Consideration.pdf
Univ-Connecticut-ChatGPT-Presentaion.pdf
A comparative analysis of optical character recognition models for extracting...
OMC Textile Division Presentation 2021.pptx
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Mushroom cultivation and it's methods.pdf
Heart disease approach using modified random forest and particle swarm optimi...
Programs and apps: productivity, graphics, security and other tools
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
A comparative study of natural language inference in Swahili using monolingua...
A novel scalable deep ensemble learning framework for big data classification...
A Presentation on Artificial Intelligence
Hindi spoken digit analysis for native and non-native speakers
Zenith AI: Advanced Artificial Intelligence
A Presentation on Touch Screen Technology
Chapter 5: Probability Theory and Statistics
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
SOPHOS-XG Firewall Administrator PPT.pptx
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Group 1 Presentation -Planning and Decision Making .pptx

Data Science towards the Digital Enterprise

  • 1. Data Science a n u n e x p e c te d j o u r n e y t o t h e Digital Enterp ris e J a k e B o u m a A p r i l 2 0 1 7
  • 2. What I’m about “Open Data Science is not a single technology, but a revolution within the data science community; an inclusive movement that makes open source tools for data science - data, analytics and computation - easily work together as a connected ecosystem.” https://www.linkedin.com/in/jaketbouma https://github.com/jaketbouma
  • 3. Project Switzerland ( w e ’ r e n e u t r a l l i k e t h a t ) “To develop the business intelligence engine that enables BLT to aggregate, process, analyse and prioritise large sets of customer data to reveal patterns and trends for improved customer engagement, increased revenue and operating efficiencies”
  • 4. On your marks, get set, data science!? CEO Mark Levy Data Scientist Jake Bouma CIO Andrew Murray
  • 5. The first two months • Extensive meetings with Business Units to identify opportunities • The data battle: building relationships and scraping sources together • Hacking POCs out <Some preliminary results that I couldn’t share publicly>
  • 6. Value Demonstrated “Data is the organization’s most underutilized asset.” -- Andrew Murray
  • 7. Actually, this is a big data problem
  • 8. To Hadoop or not to Hadoop This thing is big. Data science works better on more data. The most interesting data is messy. Speed is king. Performant access to the most granular, rawest form of everything
  • 9. Hadoop has taught us some important lessons Stop trying to force one tool to do everything. Scale out works, send the code to the data. Relational databases are still quite nice.
  • 10. DevOps for big data DATA COMPUTE Build what you need, when you need it Decouple data from applications Always be ready to scale
  • 11. 6 Months later: State of the art BDP * 3 Networks, > 10 different types of server, >20 running servers, cloud native
  • 12. BDP cruising “There is just too much going on.” -- Jake Bouma
  • 13. Agile Methodologies to the Rescue! Small ideas Small parts of a big idea Big Ideas Fail Fast Prioritize and Track Long Term Objectives Timescale of quarters Timescale of weeks
  • 14. Building out a team “Practical solutions to not very well defined problems.” -- Owen Zhang Top Ranked Kagglers list on their profiles: • Electronics Engineer • Quantitative Researcher • Accounting, Finance and Risk • Computer Science • Credit risk modelling • Entrepreneur • Actuarial science • Business Analyst • No Physicists!? 
  • 15. Are we there yet? “I think I missed a trick along the way...” -- Jake Bouma
  • 16. Culture is the spice that makes it Digital
  • 17. Agile Platform BIG DATA TECHNOLOGY DATA SCIENCE OPENNESS CULTURE AGILE Questions Action FeedbackImprovement DEVOPS The working digital system of insight?