SlideShare a Scribd company logo
INTRODUCING THE CONCEPTS OF DATA SCIENCE
AND BIG DATA IN A TRADITIONAL DATA WAREHOUSE
COMPANY
Joris Bos
Manager IT, DDW
Dairy Data Warehouse (DDW)
• Start up founded in 2013
• Based in Assen, Netherlands
• DDW offers data services to dairy farm
consultants and corporate stakeholders in
the dairy industry across the globe
DDW BIG Dairy Data Platform
1. DATA INPUT
Herd
management
Data capture
devices
Milk recording
data
2. DATA TRANSFORMATION 3. DEEP LEARNING
DDW ETL unique
technology transforms
all data into a standard
common format
Dairy Data Warehouse
Single consistent database
Production
Reproduction
Disease
4. DDW PREDICTIONS
‘Advance to go’
• Daily uploads (batching)
• Slow database XML exports
• Calculations freezing data scientist’ pc
• Valuable data locked away in private networks,
widely spread
• This new ‘thing’!
Data warehouses are dead! Or?
• Trendy, all will be superseded
• New paradigm
• On premises or Cloud?
• Clearly differentiate use-cases, strengths
• HDFS != DMS
• Different target audience
• Aim to retain value from previous investments
DDW Big data infrastructure
• Set-up from zero, best practices opportunity
• Security, data access and data governance at the core
• New roles and competences within the organization
• Data lake at the center
• Seperation storage / compute
• Support for persisted, historical data now (batch)
• Streaming (real-time) near future
• Audit, audit, audit, audit
Data Lake & Data Warehouse
Big Data pipeline
Data Engineer Data Scientist
Value
Chosen setup (Azure)
• Storage : Azure Data Lake (Active Directory Security / Office365)
• Integration runtimes (Data Factory)
• ELT
• Data warehouse as ‘just’ another data source
• Data Catalog
• Compute : On-demand clusters with Hortonworks distribution platform (HDP)
• ML Deep Learning : Model calculation with GPU powered VMs
• ETL to data warehouse for hot layer going forward
Chosen setup
Takeaways
• Establish clear language around Big Data and tooling
• Clearly position Big Data in your company (vision, budget, etc.)
• Define new roles, workflows and conventions
• Stick to best practices
• Start small, Determine Minimum Viable Product
• Cloud is great for ad-hoc, scalability demands
• Iterate and experiment through POCs
Thank you
Feel free to contact me:
Joris Bos
J.bos@dairydatawarehouse.com

More Related Content

PPTX
Webinar: Sizing Up Object Storage for the Enterprise
PDF
When Databases Meet Big data and Hadoop - Uni of Tromso Online Lecture
PPTX
Webinar: NAS vs. Object Storage: 10 Reasons Why Object Storage Will Win
PDF
Houd controle over uw data
PDF
Are You Killing the Benefits of Your Data Lake?
PDF
7 steps to storage freedom and avoiding vendor lock in - io fabric 2017
PPTX
Better Together: The New Data Management Orchestra
PPTX
Data warehousing
Webinar: Sizing Up Object Storage for the Enterprise
When Databases Meet Big data and Hadoop - Uni of Tromso Online Lecture
Webinar: NAS vs. Object Storage: 10 Reasons Why Object Storage Will Win
Houd controle over uw data
Are You Killing the Benefits of Your Data Lake?
7 steps to storage freedom and avoiding vendor lock in - io fabric 2017
Better Together: The New Data Management Orchestra
Data warehousing

What's hot (20)

PDF
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
PDF
Denodo DataFest 2016: Big Data Virtualization in the Cloud
PDF
Case Study: Big Data Analytics
PPTX
Data Warehouse in Cloud
PPTX
Architecting a Modern Data Warehouse: Enterprise Must-Haves
PPT
Webinar - The Agility Challenge - Powering Cloud Apps with Multi-Model & Mixe...
PPTX
DataStax on Azure: Deploying an industry-leading data platform for cloud apps...
PPTX
Enterprise Data Hub: The Next Big Thing in Big Data
PPTX
Hadoop: Extending your Data Warehouse
PPTX
SQL In/On/Around Hadoop
PPTX
Data stax webinar cassandra and titandb insights into datastax graph strategy...
PPTX
Terracotta Hadoop & In-Memory Webcast
PPTX
Cloudera Federal Forum 2014: The Building Blocks of the Enterprise Data Hub
PDF
The Future of Data Management: The Enterprise Data Hub
PDF
Stora Enso&Wipro - Stora Enso Rethinks Supply Chain - ProcessForum Nordic, No...
PPTX
Optimize the performance, cost, and value of databases.pptx
PPT
DW 101
PPTX
An Operational Data Layer is Critical for Transformative Banking Applications
PPTX
The Yellowbrick Impact for MicroStrategy
PDF
"Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wo...
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Denodo DataFest 2016: Big Data Virtualization in the Cloud
Case Study: Big Data Analytics
Data Warehouse in Cloud
Architecting a Modern Data Warehouse: Enterprise Must-Haves
Webinar - The Agility Challenge - Powering Cloud Apps with Multi-Model & Mixe...
DataStax on Azure: Deploying an industry-leading data platform for cloud apps...
Enterprise Data Hub: The Next Big Thing in Big Data
Hadoop: Extending your Data Warehouse
SQL In/On/Around Hadoop
Data stax webinar cassandra and titandb insights into datastax graph strategy...
Terracotta Hadoop & In-Memory Webcast
Cloudera Federal Forum 2014: The Building Blocks of the Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
Stora Enso&Wipro - Stora Enso Rethinks Supply Chain - ProcessForum Nordic, No...
Optimize the performance, cost, and value of databases.pptx
DW 101
An Operational Data Layer is Critical for Transformative Banking Applications
The Yellowbrick Impact for MicroStrategy
"Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wo...
Ad

Similar to Dairy data warehouse - Introducing the concept of Data Science and Big Data in a traditional data warehouse company (20)

PPTX
Hadoop and Your Data Warehouse
PPTX
5 Things that Make Hadoop a Game Changer
PDF
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
PDF
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
PPTX
Designing modern dw and data lake
PDF
Data Vault Introduction
PPT
ODI 11g in the Enterprise - BIWA 2013
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
PDF
Incorporating the Data Lake into Your Analytic Architecture
PDF
Ask bigger questions
PPTX
Data Warehouse Optimization
PDF
Designing a modern data warehouse in azure
PDF
Designing a modern data warehouse in azure
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r2)
PDF
Hadoop as a Data Hub
PPTX
Big Data: Setting Up the Big Data Lake
PDF
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
PDF
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
PPTX
Big Data Practice_Planning_steps_RK
PPTX
Slide Share MDW Modern Data Warehouse DWH
Hadoop and Your Data Warehouse
5 Things that Make Hadoop a Game Changer
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Designing modern dw and data lake
Data Vault Introduction
ODI 11g in the Enterprise - BIWA 2013
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Incorporating the Data Lake into Your Analytic Architecture
Ask bigger questions
Data Warehouse Optimization
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Hadoop as a Data Hub
Big Data: Setting Up the Big Data Lake
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
Big Data Practice_Planning_steps_RK
Slide Share MDW Modern Data Warehouse DWH
Ad

More from BigDataExpo (20)

PDF
Centric - Jaap huisprijzen, GTST, The Bold, IKEA en IENS. Zomaar wat toepassi...
PDF
Google Cloud - Google's vision on AI
PDF
Pacmed - Machine Learning in health care: opportunities and challanges in pra...
PDF
PGGM - The Future Explore
PDF
Universiteit Utrecht & gghdc - Wat zijn de gezondheidseffecten van omgeving e...
PPTX
Rob van Kranenburg - Kunnen we ons een sociaal krediet systeem zoals in het o...
PDF
OrangeNXT - High accuracy mapping from videos for efficient fiber optic cable...
PDF
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
PDF
Teleperformance - Smart personalized service door het gebruik van Data Science
PDF
FunXtion - Interactive Digital Fitness with Data Analytics
PDF
fashionTrade - Vroeger noemde we dat Big Data
PDF
BigData Republic - Industrializing data science: a view from the trenches
PDF
Bicos - Hear how a top sportswear company produced cutting-edge data infrastr...
PDF
Endrse - Next level online samenwerkingen tussen personalities en merken met ...
PDF
Bovag - Refine-IT - Proces optimalisatie in de automotive sector
PDF
Schiphol - Optimale doorstroom van passagiers op Schiphol dankzij slimme data...
PDF
Veco - Big Data in de Supply Chain: Hoe Process Mining kan helpen kosten te r...
PPTX
Rabobank - There is something about Data
PDF
VU Amsterdam - Big data en datagedreven waardecreatie: valt er nog iets te ki...
PDF
Booking.com - Data science and experimentation at Booking.com: a data-driven ...
Centric - Jaap huisprijzen, GTST, The Bold, IKEA en IENS. Zomaar wat toepassi...
Google Cloud - Google's vision on AI
Pacmed - Machine Learning in health care: opportunities and challanges in pra...
PGGM - The Future Explore
Universiteit Utrecht & gghdc - Wat zijn de gezondheidseffecten van omgeving e...
Rob van Kranenburg - Kunnen we ons een sociaal krediet systeem zoals in het o...
OrangeNXT - High accuracy mapping from videos for efficient fiber optic cable...
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
Teleperformance - Smart personalized service door het gebruik van Data Science
FunXtion - Interactive Digital Fitness with Data Analytics
fashionTrade - Vroeger noemde we dat Big Data
BigData Republic - Industrializing data science: a view from the trenches
Bicos - Hear how a top sportswear company produced cutting-edge data infrastr...
Endrse - Next level online samenwerkingen tussen personalities en merken met ...
Bovag - Refine-IT - Proces optimalisatie in de automotive sector
Schiphol - Optimale doorstroom van passagiers op Schiphol dankzij slimme data...
Veco - Big Data in de Supply Chain: Hoe Process Mining kan helpen kosten te r...
Rabobank - There is something about Data
VU Amsterdam - Big data en datagedreven waardecreatie: valt er nog iets te ki...
Booking.com - Data science and experimentation at Booking.com: a data-driven ...

Recently uploaded (20)

PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Computer network topology notes for revision
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Foundation of Data Science unit number two notes
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
Data_Analytics_and_PowerBI_Presentation.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
IB Computer Science - Internal Assessment.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Business Ppt On Nestle.pptx huunnnhhgfvu
Computer network topology notes for revision
STUDY DESIGN details- Lt Col Maksud (21).pptx
Reliability_Chapter_ presentation 1221.5784
Business Acumen Training GuidePresentation.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Launch Your Data Science Career in Kochi – 2025
Clinical guidelines as a resource for EBP(1).pdf
Foundation of Data Science unit number two notes
Galatica Smart Energy Infrastructure Startup Pitch Deck

Dairy data warehouse - Introducing the concept of Data Science and Big Data in a traditional data warehouse company

  • 1. INTRODUCING THE CONCEPTS OF DATA SCIENCE AND BIG DATA IN A TRADITIONAL DATA WAREHOUSE COMPANY Joris Bos Manager IT, DDW
  • 2. Dairy Data Warehouse (DDW) • Start up founded in 2013 • Based in Assen, Netherlands • DDW offers data services to dairy farm consultants and corporate stakeholders in the dairy industry across the globe
  • 3. DDW BIG Dairy Data Platform 1. DATA INPUT Herd management Data capture devices Milk recording data 2. DATA TRANSFORMATION 3. DEEP LEARNING DDW ETL unique technology transforms all data into a standard common format Dairy Data Warehouse Single consistent database Production Reproduction Disease 4. DDW PREDICTIONS
  • 4. ‘Advance to go’ • Daily uploads (batching) • Slow database XML exports • Calculations freezing data scientist’ pc • Valuable data locked away in private networks, widely spread • This new ‘thing’!
  • 5. Data warehouses are dead! Or? • Trendy, all will be superseded • New paradigm • On premises or Cloud? • Clearly differentiate use-cases, strengths • HDFS != DMS • Different target audience • Aim to retain value from previous investments
  • 6. DDW Big data infrastructure • Set-up from zero, best practices opportunity • Security, data access and data governance at the core • New roles and competences within the organization • Data lake at the center • Seperation storage / compute • Support for persisted, historical data now (batch) • Streaming (real-time) near future • Audit, audit, audit, audit
  • 7. Data Lake & Data Warehouse
  • 8. Big Data pipeline Data Engineer Data Scientist Value
  • 9. Chosen setup (Azure) • Storage : Azure Data Lake (Active Directory Security / Office365) • Integration runtimes (Data Factory) • ELT • Data warehouse as ‘just’ another data source • Data Catalog • Compute : On-demand clusters with Hortonworks distribution platform (HDP) • ML Deep Learning : Model calculation with GPU powered VMs • ETL to data warehouse for hot layer going forward
  • 11. Takeaways • Establish clear language around Big Data and tooling • Clearly position Big Data in your company (vision, budget, etc.) • Define new roles, workflows and conventions • Stick to best practices • Start small, Determine Minimum Viable Product • Cloud is great for ad-hoc, scalability demands • Iterate and experiment through POCs
  • 12. Thank you Feel free to contact me: Joris Bos J.bos@dairydatawarehouse.com