SlideShare a Scribd company logo
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Unlocking New Big Data
Insights with Hadoop & MySQL
Ricky Setyawan
MySQL Principal Consultant - ASEAN
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
An Avalanche of Data
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Create Value
Big Data What It Is, What it Means
Volume
Variety
Velocity
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
What’s Changed?
• Enablers
– Digitization – nearly everything has a digital heartbeat
– Ability to store much larger data volumes (distributed file systems)
– Ability to process much larger data volumes (parallel processing)
• Why is this different from BI/DW?
– Business formulated questions to ask upfront
– Drove what was data collected, data model, query design
Big Data Enables what-if analysis, real-time discovery
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Big Data Adoption
• Web Recommendations
• Sentiment Analysis
• Marketing Campaign Analysis
• Customer Churn Modeling
• Fraud Detection
• Research and Development
• Risk Modeling
• Machine Learning
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Leading Use-Case, On-Line Retail
Users
Browsing
Recommendations
Profile,
Purchase
History
Web Logs:
Pages Viewed
Comments Posted
Social media updates
Preferences
Brands “Liked”
Recommendations
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Why Hadoop?
• Scales to thousands of nodes, PB of structured and unstructured data
– Combines data from multiple sources, schema-less
– Run queries against all of the data
• Runs on commodity servers, handle storage and processing
• Data replicated, self-healing
• Initially just batch (Map/Reduce) processing
– Extending with interactive querying, via Apache Drill, Cloudera Impala, Stinger etc.
Copyright 2014, Oracle and/or its affiliates. All rights reserved.
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Big Data Lifecycle
Better Decisions Using Big Data
ANALYZE
DECIDE ACQUIRE
ORGANIZE
CREATE VALUE
FROM DATA
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Big Data Lifecycle
Better Decisions Using Big Data
ACQUIRE
CREATE VALUE
FROM DATA
MySQL Database
MySQL Cluster
JSON Support
NoSQL Interfaces
MySQL Fabric
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
MySQL 5.7 Sysbench Benchmark: SQL Point Selects
3x Faster than MySQL 5.6
1,600,000 QPS
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1,400,000
1,600,000
1,800,000
8 16 32 64 128 256 512 1,024
QueriesperSecond
Connections
MySQL 5.7: Sysbench OLTP Read Only (SQL Point Selects)
MySQL 5.7
MySQL 5.6
MySQL 5.5
Intel(R) Xeon(R) CPU E7-8890 v3
4 sockets x 18 cores-HT (144 CPU threads)
2.5 Ghz, 512GB RAM
Linux kernel 3.16
10
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 11
Hybrid Database: Rock Solid Reliability + Flexibility
MySQL 5.7
JSON Support
Traditional RDBMS
Proven, transactional, secure
Complex JOINs and queries
Extensive operational tools
NoSQL Solutions
Flexible. Easy-to-use.
Schema-less document storage
Modern Applications
Require agile development
and operations with robust
data protection and security
Hybrid Database
No trade-offs, best of both
worlds. ACID properties &
reliability of RDMS + flexible
document management
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
MySQL NoSQL Interfaces: Fast, Flexible, Safe
Blazing Fast
Key/Value Queries
Fully Transactional/
ACID
NoSQL And SQL
Across the same
data Set
Combined with Schema Flexibility: Online DDL
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
NoSQL Interfaces to MySQL Cluster
MySQL Cluster Data Nodes
Clients
Application Layer
Data Layer
Copyright 2015, oracle and/or its affiliates. All rights reserved 13
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
• Memory optimized tables
– Durable
– Mix with disk-based tables
• Massively concurrent OLTP
• Distributed Joins for analytics
• Parallel table scans for non-indexed
searches
• MySQL Cluster 7.4 FlexAsych
– 200M NoSQL Reads/Second
MySQL Cluster 7.4 NoSQL Performance
200 Million NoSQL Reads/Second
-
50,000,000
100,000,000
150,000,000
200,000,000
250,000,000
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
Readspersecond
Data Nodes
FlexAsync Reads
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
• Memory optimized tables
– Durable
– Mix with disk-based tables
• Massively concurrent OLTP
• Distributed Joins for analytics
• Parallel table scans for non-indexed
searches
• MySQL Cluster 7.4 DBT2 BM
– 2.5M SQL Statements/Second
MySQL Cluster 7.4 SQL Performance
2.5M SQL Statements/Second
-
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
2 4 6 8 10 12 14 16
SQLStatements/sec
Data Nodes
DBT2 SQL Statements per Second
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
MySQL Fabric
Scale out with Data Sharding + High Availability
• Scale-out through sharding
• Read AND Write
• Standard framework,
no more custom solutions
• HA out of the box
• On top of Replication
• Automatic failover
• Automatic routing
MySQL Fabric
Connector
Application
Read-slaves
SQL
Master group
Read-slaves
Master group
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Big Data Lifecycle
Better Decisions Using Big Data
ACQUIRE
ORGANIZE
CREATE VALUE
FROM DATA
Import Data
Apache Sqoop
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Apache Sqoop
• Apache TLP, part of Hadoop project
• Originally developed by Cloudera
• Bulk data import and export
• Between Hadoop (HDFS) and external data stores
• JDBC Connector architecture
• Supports plug-ins for specific functionality
• “Fast Path” Connector developed for MySQL
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
MySQL Applier for Hadoop
Copyright 2014, Oracle and/or its affiliates. All rights reserved.
• Real-time streaming of events from MySQL to Hadoop
Supports move towards “Speed of Thought” analytics
• Connects to the binary log, writes events to HDFS via libhdfs library
• Each database table mapped to a Hive data warehouse directory
• Enables eco-system of Hadoop tools to integrate with MySQL data
• Available for download now: labs.mysql.com
labs.mysql.com
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
MySQL Applier for Hadoop
21
labs.mysql.com
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Big Data Lifecycle
Better Decisions Using Big Data
ANALYZE
DECIDE
CREATE VALUE
FROM DATA
Analyze
Export Data
Decide
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Analyze Big Data in Hadoop
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
MySQL Reporting Database for BI
Copyright 2014, Oracle and/or its affiliates. All rights reserved.
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Summary
• Create value from Big Data with MySQL
• MySQL + Hadoop: widely deployed solution (80% of Hadoop project)
• “Best of both worlds”: SQL + NoSQL Access; Schema-less data management
• Scale Out & data sharding with MySQL Fabric
• Tools and expertise to support you
Unlocking big data with Hadoop + MySQL

More Related Content

PDF
Big Data with MySQL
PDF
A7 storytelling with_oracle_analytics_cloud
PPTX
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
PDF
Azure Data Factory v2
PDF
Unleash the power of Azure Data Factory
PPTX
PPTX
Introducing the Snowflake Computing Cloud Data Warehouse
PDF
Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling
Big Data with MySQL
A7 storytelling with_oracle_analytics_cloud
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
Azure Data Factory v2
Unleash the power of Azure Data Factory
Introducing the Snowflake Computing Cloud Data Warehouse
Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

What's hot (20)

PPTX
Colorado Springs Open Source Hadoop/MySQL
PPTX
Data warehousing
PPTX
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...
PDF
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
PDF
Postgres Integrates Effectively in the "Enterprise Sandbox"
 
PPTX
Atlanta Data Science Meetup | Qubole slides
PDF
Streaming with Oracle Data Integration
PPTX
Demystifying Data Warehouse as a Service
PDF
Strata+Hadoop World NY 2016 - Avinash Ramineni
PDF
Analyzing Semi-Structured Data At Volume In The Cloud
PPTX
Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...
PPTX
What's new in SQL Server 2016
PPT
Webinar: 2 Billion Data Points Each Day
PPTX
Introduction to Kudu - StampedeCon 2016
PPTX
Modern Data Warehousing with the Microsoft Analytics Platform System
PPTX
Microsoft Data Platform - What's included
PDF
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
PPTX
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
PPTX
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
PPTX
Colorado Springs Open Source Hadoop/MySQL
Data warehousing
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
Postgres Integrates Effectively in the "Enterprise Sandbox"
 
Atlanta Data Science Meetup | Qubole slides
Streaming with Oracle Data Integration
Demystifying Data Warehouse as a Service
Strata+Hadoop World NY 2016 - Avinash Ramineni
Analyzing Semi-Structured Data At Volume In The Cloud
Webinar: Bitcoins and Blockchains - Emerging Financial Services Trends and Te...
What's new in SQL Server 2016
Webinar: 2 Billion Data Points Each Day
Introduction to Kudu - StampedeCon 2016
Modern Data Warehousing with the Microsoft Analytics Platform System
Microsoft Data Platform - What's included
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Ad

Similar to Unlocking big data with Hadoop + MySQL (20)

PDF
Unlocking Big Data Insights with MySQL
PDF
My sql5.7 whatsnew_presentedatgids2015
PDF
MySQL 8.0 - What's New ?
PPTX
MySQL Cluster - Latest Developments (up to and including MySQL Cluster 7.4)
PDF
What's New in MySQL 5.7
PDF
MySQL & Oracle Linux Keynote at Open Source India 2014
PDF
MySQL 5.7: What's New, Nov. 2015
PDF
MySQL :What's New #GIDS16
PDF
Mysql User Camp : 20-June-14 : Mysql New features and NoSQL Support
PDF
Mysql User Camp : 20th June - Mysql New Features
PDF
MySQL en el mundo real. Evolución desde la compra por Oracle
PPTX
MySQL London Tech Tour March 2015 - MySQL Fabric
PDF
Introduction to MySQL
PDF
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15
PPTX
20140722 Taiwan MySQL User Group Meeting Tech Updates
PDF
MySQL Intro JSON NoSQL
PPTX
OUG Scotland 2014 - NoSQL and MySQL - The best of both worlds
PPTX
What_to_expect_from_oracle_database_12c
PDF
MySQL 5.6, news in 5.7 and our HA options
PPTX
MySQL London Tech Tour March 2015 - Embedded Database of Choice
Unlocking Big Data Insights with MySQL
My sql5.7 whatsnew_presentedatgids2015
MySQL 8.0 - What's New ?
MySQL Cluster - Latest Developments (up to and including MySQL Cluster 7.4)
What's New in MySQL 5.7
MySQL & Oracle Linux Keynote at Open Source India 2014
MySQL 5.7: What's New, Nov. 2015
MySQL :What's New #GIDS16
Mysql User Camp : 20-June-14 : Mysql New features and NoSQL Support
Mysql User Camp : 20th June - Mysql New Features
MySQL en el mundo real. Evolución desde la compra por Oracle
MySQL London Tech Tour March 2015 - MySQL Fabric
Introduction to MySQL
Oracle NoSQL Database -- Big Data Bellevue Meetup - 02-18-15
20140722 Taiwan MySQL User Group Meeting Tech Updates
MySQL Intro JSON NoSQL
OUG Scotland 2014 - NoSQL and MySQL - The best of both worlds
What_to_expect_from_oracle_database_12c
MySQL 5.6, news in 5.7 and our HA options
MySQL London Tech Tour March 2015 - Embedded Database of Choice
Ad

Recently uploaded (20)

PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
history of c programming in notes for students .pptx
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Softaken Excel to vCard Converter Software.pdf
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
medical staffing services at VALiNTRY
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
assetexplorer- product-overview - presentation
PDF
top salesforce developer skills in 2025.pdf
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
ai tools demonstartion for schools and inter college
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
history of c programming in notes for students .pptx
Reimagine Home Health with the Power of Agentic AI​
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Softaken Excel to vCard Converter Software.pdf
CHAPTER 2 - PM Management and IT Context
Which alternative to Crystal Reports is best for small or large businesses.pdf
wealthsignaloriginal-com-DS-text-... (1).pdf
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
medical staffing services at VALiNTRY
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Wondershare Filmora 15 Crack With Activation Key [2025
assetexplorer- product-overview - presentation
top salesforce developer skills in 2025.pdf
Design an Analysis of Algorithms II-SECS-1021-03
Operating system designcfffgfgggggggvggggggggg
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Navsoft: AI-Powered Business Solutions & Custom Software Development
ai tools demonstartion for schools and inter college
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)

Unlocking big data with Hadoop + MySQL

  • 1. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Unlocking New Big Data Insights with Hadoop & MySQL Ricky Setyawan MySQL Principal Consultant - ASEAN
  • 2. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | An Avalanche of Data
  • 3. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Create Value Big Data What It Is, What it Means Volume Variety Velocity
  • 4. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | What’s Changed? • Enablers – Digitization – nearly everything has a digital heartbeat – Ability to store much larger data volumes (distributed file systems) – Ability to process much larger data volumes (parallel processing) • Why is this different from BI/DW? – Business formulated questions to ask upfront – Drove what was data collected, data model, query design Big Data Enables what-if analysis, real-time discovery
  • 5. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Big Data Adoption • Web Recommendations • Sentiment Analysis • Marketing Campaign Analysis • Customer Churn Modeling • Fraud Detection • Research and Development • Risk Modeling • Machine Learning
  • 6. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Leading Use-Case, On-Line Retail Users Browsing Recommendations Profile, Purchase History Web Logs: Pages Viewed Comments Posted Social media updates Preferences Brands “Liked” Recommendations
  • 7. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Why Hadoop? • Scales to thousands of nodes, PB of structured and unstructured data – Combines data from multiple sources, schema-less – Run queries against all of the data • Runs on commodity servers, handle storage and processing • Data replicated, self-healing • Initially just batch (Map/Reduce) processing – Extending with interactive querying, via Apache Drill, Cloudera Impala, Stinger etc. Copyright 2014, Oracle and/or its affiliates. All rights reserved.
  • 8. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Big Data Lifecycle Better Decisions Using Big Data ANALYZE DECIDE ACQUIRE ORGANIZE CREATE VALUE FROM DATA
  • 9. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Big Data Lifecycle Better Decisions Using Big Data ACQUIRE CREATE VALUE FROM DATA MySQL Database MySQL Cluster JSON Support NoSQL Interfaces MySQL Fabric
  • 10. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | MySQL 5.7 Sysbench Benchmark: SQL Point Selects 3x Faster than MySQL 5.6 1,600,000 QPS 0 200,000 400,000 600,000 800,000 1,000,000 1,200,000 1,400,000 1,600,000 1,800,000 8 16 32 64 128 256 512 1,024 QueriesperSecond Connections MySQL 5.7: Sysbench OLTP Read Only (SQL Point Selects) MySQL 5.7 MySQL 5.6 MySQL 5.5 Intel(R) Xeon(R) CPU E7-8890 v3 4 sockets x 18 cores-HT (144 CPU threads) 2.5 Ghz, 512GB RAM Linux kernel 3.16 10
  • 11. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 11 Hybrid Database: Rock Solid Reliability + Flexibility MySQL 5.7 JSON Support Traditional RDBMS Proven, transactional, secure Complex JOINs and queries Extensive operational tools NoSQL Solutions Flexible. Easy-to-use. Schema-less document storage Modern Applications Require agile development and operations with robust data protection and security Hybrid Database No trade-offs, best of both worlds. ACID properties & reliability of RDMS + flexible document management
  • 12. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | MySQL NoSQL Interfaces: Fast, Flexible, Safe Blazing Fast Key/Value Queries Fully Transactional/ ACID NoSQL And SQL Across the same data Set Combined with Schema Flexibility: Online DDL
  • 13. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | NoSQL Interfaces to MySQL Cluster MySQL Cluster Data Nodes Clients Application Layer Data Layer Copyright 2015, oracle and/or its affiliates. All rights reserved 13
  • 14. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | • Memory optimized tables – Durable – Mix with disk-based tables • Massively concurrent OLTP • Distributed Joins for analytics • Parallel table scans for non-indexed searches • MySQL Cluster 7.4 FlexAsych – 200M NoSQL Reads/Second MySQL Cluster 7.4 NoSQL Performance 200 Million NoSQL Reads/Second - 50,000,000 100,000,000 150,000,000 200,000,000 250,000,000 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 Readspersecond Data Nodes FlexAsync Reads
  • 15. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | • Memory optimized tables – Durable – Mix with disk-based tables • Massively concurrent OLTP • Distributed Joins for analytics • Parallel table scans for non-indexed searches • MySQL Cluster 7.4 DBT2 BM – 2.5M SQL Statements/Second MySQL Cluster 7.4 SQL Performance 2.5M SQL Statements/Second - 500,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 2 4 6 8 10 12 14 16 SQLStatements/sec Data Nodes DBT2 SQL Statements per Second
  • 16. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | MySQL Fabric Scale out with Data Sharding + High Availability • Scale-out through sharding • Read AND Write • Standard framework, no more custom solutions • HA out of the box • On top of Replication • Automatic failover • Automatic routing MySQL Fabric Connector Application Read-slaves SQL Master group Read-slaves Master group
  • 17. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Big Data Lifecycle Better Decisions Using Big Data ACQUIRE ORGANIZE CREATE VALUE FROM DATA Import Data Apache Sqoop
  • 18. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Apache Sqoop • Apache TLP, part of Hadoop project • Originally developed by Cloudera • Bulk data import and export • Between Hadoop (HDFS) and external data stores • JDBC Connector architecture • Supports plug-ins for specific functionality • “Fast Path” Connector developed for MySQL
  • 19. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | MySQL Applier for Hadoop Copyright 2014, Oracle and/or its affiliates. All rights reserved. • Real-time streaming of events from MySQL to Hadoop Supports move towards “Speed of Thought” analytics • Connects to the binary log, writes events to HDFS via libhdfs library • Each database table mapped to a Hive data warehouse directory • Enables eco-system of Hadoop tools to integrate with MySQL data • Available for download now: labs.mysql.com labs.mysql.com
  • 20. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | MySQL Applier for Hadoop 21 labs.mysql.com
  • 21. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Big Data Lifecycle Better Decisions Using Big Data ANALYZE DECIDE CREATE VALUE FROM DATA Analyze Export Data Decide
  • 22. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Analyze Big Data in Hadoop
  • 23. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | MySQL Reporting Database for BI Copyright 2014, Oracle and/or its affiliates. All rights reserved.
  • 24. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Summary • Create value from Big Data with MySQL • MySQL + Hadoop: widely deployed solution (80% of Hadoop project) • “Best of both worlds”: SQL + NoSQL Access; Schema-less data management • Scale Out & data sharding with MySQL Fabric • Tools and expertise to support you