SlideShare a Scribd company logo
MariaDB ColumnStore
Use Cases and Upcoming 1.1
features.
David Thompson
VP Engineering @ MariaDB
DB Tech Showcase Tokyo
September 7th 2017
What is MariaDB ColumnStore?
High performance columnar storage engine that supports a wide variety
of analytical use cases in highly scalable distributed environments
Parallel query
processing for distributed
environments
Faster, More
Efficient Queries
Single Interface for
OLTP and analytics
Easy to Manage and Scale
Easier Enterprise
Analytics
Power of SQL and
Freedom of Open Source
to Big Data Analytics
Better Price
Performance
Better Price
Performance
Flexible deployment option
• Cloud and On-premise
• Run on commodity hardware
• Open Source, Subscription based pricing
No need to maintain a third platform
• Run analytics from the same SQL front end
• No need to update application code
• Leverage MariaDB Extensible architecture
High data compression
• More efficient at storing big data
• Less hardware
90.3%
less per TB
per year
Commercial Data
Warehouse
MariaDB
ColumnStore
Easier Enterprise
Analytics
ANSI SQL
Single SQL Front-end
• Use a single SQL interface for analytics and OLTP
• Leverage MariaDB Security features - Encryption for
data in motion , role based access and auditability
Full ANSI SQL
• No more SQL “like” query
• Support complex join, aggregation and window
function
Easy to manage and scale
• Eliminate needs for indexes and views
• Automated horizontal/vertical partitioning
• Linear scalable by adding new nodes as data grows
• Out of box connection with BI tools
Faster, More
Efficient Queries
Optimized for Columnar storage
• Columnar storage reduces disk I/O
• Blazing fast read-intensive workload
• Ultra fast data import
Parallel distributed query execution
• Distributed queries into series of parallel operations
• Fully parallel high speed data ingestion
Highly available analytic environment
• Built-in Redundancy
• Automatic fail-over
Parallel
Query Processing
MariaDB ColumnStore Architecture
• Massively parallel
architecture
– Linear scalability as
new nodes are added
• Horizontal scaling
– Add new data nodes
as your data grows
– Continue read queries
when adding new nodes
– Utilize MaxScale to load
balance and provide single
front end access point.
Shared-Nothing Distributed Data Storage
Compressed by default
User
Module
(UM)
Performance
Module
(PM)
Data Storage
MaxScaleMaxScale
Load
Balancer
ColumnStore Use Cases
MariaDB ColumnStore Use Cases
Financial Services Healthcare Telecommunications High Tech
Financial Services Industry
Industry Background
• Every customer interaction generates electronic records
• All transactions must be retained due to regulatory requirements
• Customer centric marketing became more important due to fierce competition
Why MariaDB ColumnStore
- Cost effective solution to archive all transactional data securely for regulatory compliance
- Fast data import from transactional database
- Easy to analyze the archived data with SQL based analytics
- Does not require DBA to index or partition data
Financial Services Industry
Use Cases
Regulatory Compliance
• Archive and retain historic transactional data
Fraud Detection
• Fraudulent or anomaly trade detection among millions of transactions per day
• Proactively identify risks and prevent billions of loss due to fraud
Trade Analytics
• Analyze 20-30 million quotes per day
• Identify trade patterns and predict the outcome
Healthcare / Life Science Industry
Industry Background
• Electronic Medical Record (EMR) usage is increasing 48% annually
• Increased adoption of big data for advanced research projects
• Data protection and privacy regulations
Why MariaDB ColumnStore
- Strong security features including role based data access and audit plug in
- MPP architecture handles analytics on big data with high speed
- Easy to analyze archived data with SQL based analytics
- Does not require DBA to index or partition data
Healthcare / Life Science Industry
Use Cases
Genome analysis
• In-depth genome research for the dairy industry to improve production of milk and protein.
• Fast data load for large amount of genome dataset (DNA data for 7billion cows in US - 20GB per load)
Healthcare spending analysis
• Analyze 3TB of US health care spending for 155 conditions with 7 years of historical data
• Used sankey diagram, treemap, and pyramid chart to analyze trends by age, sex, type of care, and condition
Viral disease analysis
• Used regional data with interactive map to identify Ebola disease spread
• The map displays not only the existing transmission of Ebola virus, but also the probability of occurrence
Visualization
IHME Visualizations library: http://www.healthdata.org/results/data-visualizations
Telecommunication Industry
Industry Background
• Extremely high digital traffic and bandwidth
• Complex service offerings (4G, 5G, Wifi, IoT)
• Customer centric / personalized service is critical due to low switching cost
• High churn rate
Why MariaDB ColumnStore
- ColumnStore support time based partitioning and time-series analysis
- Fast data load for real-time analytics
- MPP architecture handles analytics on big data with high speed
- Easy to analyze the archived data with SQL based analytics
Telecommunication Industry
Use Cases
Customer behavior analysis
• Analyze call data record to segment customers based on their behavior
• Data-driven analysis for customer satisfaction
• Create behavioral based upsell or cross-sell opportunity
Network optimization
• Combine network performance data with internal data (CDR)
• Proactive services before the service is interrupted
Call data analysis
• Data size: 6TB
• Ingest 1.5 million rows of logs per day with 30million texts and 3million calls
• Call and network quality analysis
• Provide higher quality customer services based on data
High tech Industry
Industry Background
• High pressure to improve product quality and yield through various techniques (Six Sigma, JIT, Lean etc)
• Explosion of data due to monitoring and sensor device innovations through IoT
Why MariaDB ColumnStore
- Identify patterns from massive dataset to improve yield
- MPP architecture handles analytics on big data with high speed
- Easy to analyze the archived data with SQL based analytics
- Does not require DBA to index or partition data
High tech Industry
Use Cases
Yield analysis and optimization
• Run simulation to test the semiconductor quality
• Chip designers utilize this test to improve the chip design and improve yield
• 3,000 tests run in parallel that generate 5 million to 30 million data points
Sensor Analytics
• Import data from multiple IoT sensors
• Run time series analysis to predict patterns and detect anomalies
• Correlate multiple sensor informations to predict machine failure
ColumnStore 1.1
ColumnStore 1.1
• After five 1.0.x maintenance releases bringing improved stability, 1.1 brings some
exciting new major features!
• Some new components will be under LGPL and BSL licensing. Core ColumnStore
engine and MariaDB server are GPL licensed.
• Release Timeline:
Q3 2017 Q4 2017
GA
(Q4)
Beta
(Mid September)
September October November December
ColumnStore 1.1 Features
Data Engine:
Streaming / API :
High Availability:
Analytics:
Data Types:
Ease of Use:
Performance:
Security:
Certifications:
Columnar Storage engine based on MariaDB Server 10.2
Bulk import API to support programmatic and streaming writes
Integrated GlusterFS support to provide storage HA for local disk
User Defined Aggregate / Window Functions
Text and Blob support
Backup and Restore Tool
Improved query and memory handling
Audit Plugin integration
Tableau certification
Data Streaming: ColumnStore Data API
What:
• C++ API to directly write to PM nodes
• LGPL licensed
• Per table write
• Input data is C++ data structure in API calls
• Can run remotely from UM and PM servers
Benefits:
● Real-time streaming directly into distributed data store
● No need to move large CSV data files to UM/PM
● Enable non-CSV data sources for columnstore
● Run outside UM/PM. Build custom ETL applications …
PM Node
Write
Engine
PM Node
Write
Engine
PM Node
Write
Engine
syslog
Data Sources
Data Streaming
Application
CS Data API
Library
ColumnStore Data Adapters 1.1
What ?
• Pre-packaged data adapters written using CS data API
• Convert from a specific data source into MariaDB
ColumnStore
• BSL licensed
Benefits
● Out of box real time data streaming into CS
● No need to move large CSV data files to UM/PM
● Enable non-CSV data sources for columnstore
● Run outside UM/PM. Build custom ETL applications
MaxScale CDC
Adapter
…
PM Node
Write
Engine
PM Node
Write
Engine
PM Node
Write
Engine
CS Data API
Library
MaxScale CDC
API
Avro Adapter
CS Data API
Library
Kafka Consumer
Interface
MaxScale
MDB Master
User Defined Distributed Aggregates
What
• Enables creation of user defined functions for aggregates and window functions. 1.0
supports only user defined scalar functions.
• Implemented using C++ SDK and allows map / reduce work breakdown between UM
and PM nodes.
Benefits
• Enables custom optimized analytical functions. For example:
– Sum of Squares ( Σ x2)
– Median (distributed)
What:
• Enables auto-configuration of GlusterFS as storage
filesystem for PM data.
• Guided option during install, allows specification of
data redundancy factor (2 or more) and automated
layout of data brick locations.
• If a PM node fails, then another node with a copy of the
data block takes over.
Benefits:
● Provide Data HA for on premise customers without network storage
appliances. (Or cloud providers with low performing networked
filesystems).
Built-in Data Redundancy for Local Storage
Data Block 1
Data Block 1
Copy
Data Block 1
Copy
Data Block 2 Data Block 3
Data Block 2
Copy
Data Block 3
Copy
Data Block 2
Copy
Data Block 3
Copy
PM 1 PM 2 PM 3
GlusterFS
UM
Where to find MariaDB ColumnStore?
SOFTWARE DOWNLOAD https://mariadb.com/downloads/mariadb-ax
SOURCE https://github.com/mariadb-corporation/mariadb-columnstore-engine
DOCUMENTATION https://mariadb.com/kb/en/mariadb/mariadb-columnstore/
BLOGS https://mariadb.com/blog-tags/columnstore
</>
Thank you

More Related Content

PDF
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
PPTX
Azure SQL Data Warehouse for beginners
PPTX
Running cost effective big data workloads with Azure Synapse and Azure Data L...
PPTX
Snowflake Datawarehouse Architecturing
PDF
[db tech showcase OSS 2017] A23: Analytics with MariaDB ColumnStore by MariaD...
PDF
Azure Databases for PostgreSQL, MySQL and MariaDB
PDF
Azure Data services
PPTX
An intro to Azure Data Lake
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
Azure SQL Data Warehouse for beginners
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Snowflake Datawarehouse Architecturing
[db tech showcase OSS 2017] A23: Analytics with MariaDB ColumnStore by MariaD...
Azure Databases for PostgreSQL, MySQL and MariaDB
Azure Data services
An intro to Azure Data Lake

What's hot (19)

PPTX
What's new in SQL Server 2016
PPTX
Azure SQL Database & Azure SQL Data Warehouse
PPTX
Big Data on Azure Tutorial
PPTX
Azure Lowlands: An intro to Azure Data Lake
PDF
Azure Cosmos DB
PDF
Key trends in Big Data and new reference architecture from Hewlett Packard En...
PPTX
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
PDF
What is an Open Data Lake? - Data Sheets | Whitepaper
PPTX
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
PPTX
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
PPTX
Securing your Big Data Environments in the Cloud
PPTX
Hd insight overview
PPTX
Hadoop vs. RDBMS for Advanced Analytics
PPTX
Get started with Microsoft SQL Polybase
PPTX
Understanding the IBM Power Systems Advantage
PDF
SUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UK
PPTX
Azure SQL DWH
PPTX
Spark and Couchbase– Augmenting the Operational Database with Spark
PDF
Running Cognos on Hadoop
What's new in SQL Server 2016
Azure SQL Database & Azure SQL Data Warehouse
Big Data on Azure Tutorial
Azure Lowlands: An intro to Azure Data Lake
Azure Cosmos DB
Key trends in Big Data and new reference architecture from Hewlett Packard En...
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
What is an Open Data Lake? - Data Sheets | Whitepaper
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
Securing your Big Data Environments in the Cloud
Hd insight overview
Hadoop vs. RDBMS for Advanced Analytics
Get started with Microsoft SQL Polybase
Understanding the IBM Power Systems Advantage
SUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UK
Azure SQL DWH
Spark and Couchbase– Augmenting the Operational Database with Spark
Running Cognos on Hadoop
Ad

Similar to [db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use cases and new features coming in 1.1 by MariaDB Corporation - David Thompson (20)

PDF
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
PDF
[db tech showcase OSS 2017] A25: Replacing Oracle Database at DBS Bank by Mar...
PDF
04 2017 emea_roadshowmilan_mariadb columnstore
PDF
MariaDB ColumnStore
PDF
Big Data Analytics with MariaDB ColumnStore
PDF
Big Data Analytics with MariaDB ColumnStore
PDF
Open Source für den geschäftskritischen Einsatz
PDF
MariaDB AX: Analytics with MariaDB ColumnStore
PDF
MariaDB AX: Solución analítica con ColumnStore
PPTX
Keynote: Open Source für den geschäftskritischen Einsatz
PDF
What’s new in MariaDB ColumnStore
PDF
How to make data available for analytics ASAP
PPTX
M|18 Analyzing Data with the MariaDB AX Platform
PDF
What to expect from MariaDB Platform X5, part 2
PDF
M|18 Ingesting Data with the New Bulk Data Adapters
PDF
M|18 What's New in the MariaDB AX Platform
PDF
When Open Source Meets the Enterprise
PDF
M|18 Understanding the Architecture of MariaDB ColumnStore
PDF
In-depth session: Big Data Analytics with MariaDB AX
PDF
Introduction of MariaDB AX / TX
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
[db tech showcase OSS 2017] A25: Replacing Oracle Database at DBS Bank by Mar...
04 2017 emea_roadshowmilan_mariadb columnstore
MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStore
Open Source für den geschäftskritischen Einsatz
MariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Solución analítica con ColumnStore
Keynote: Open Source für den geschäftskritischen Einsatz
What’s new in MariaDB ColumnStore
How to make data available for analytics ASAP
M|18 Analyzing Data with the MariaDB AX Platform
What to expect from MariaDB Platform X5, part 2
M|18 Ingesting Data with the New Bulk Data Adapters
M|18 What's New in the MariaDB AX Platform
When Open Source Meets the Enterprise
M|18 Understanding the Architecture of MariaDB ColumnStore
In-depth session: Big Data Analytics with MariaDB AX
Introduction of MariaDB AX / TX
Ad

More from Insight Technology, Inc. (20)

PDF
グラフデータベースは如何に自然言語を理解するか?
PDF
Docker and the Oracle Database
PDF
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
PDF
事例を通じて機械学習とは何かを説明する
PDF
仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン
PDF
MBAAで覚えるDBREの大事なおしごと
PDF
グラフデータベースは如何に自然言語を理解するか?
PDF
DBREから始めるデータベースプラットフォーム
PDF
SQL Server エンジニアのためのコンテナ入門
PDF
Lunch & Learn, AWS NoSQL Services
PDF
db tech showcase2019オープニングセッション @ 森田 俊哉
PDF
db tech showcase2019 オープニングセッション @ 石川 雅也
PDF
db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー
PPTX
難しいアプリケーション移行、手軽に試してみませんか?
PPTX
Attunityのソリューションと異種データベース・クラウド移行事例のご紹介
PPTX
そのデータベース、クラウドで使ってみませんか?
PPTX
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
PDF
複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。
PPTX
Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...
PPTX
レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]
グラフデータベースは如何に自然言語を理解するか?
Docker and the Oracle Database
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
事例を通じて機械学習とは何かを説明する
仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン
MBAAで覚えるDBREの大事なおしごと
グラフデータベースは如何に自然言語を理解するか?
DBREから始めるデータベースプラットフォーム
SQL Server エンジニアのためのコンテナ入門
Lunch & Learn, AWS NoSQL Services
db tech showcase2019オープニングセッション @ 森田 俊哉
db tech showcase2019 オープニングセッション @ 石川 雅也
db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー
難しいアプリケーション移行、手軽に試してみませんか?
Attunityのソリューションと異種データベース・クラウド移行事例のご紹介
そのデータベース、クラウドで使ってみませんか?
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。
Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...
レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]

Recently uploaded (20)

PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Approach and Philosophy of On baking technology
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
cuic standard and advanced reporting.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Spectroscopy.pptx food analysis technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Reach Out and Touch Someone: Haptics and Empathic Computing
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Approach and Philosophy of On baking technology
Understanding_Digital_Forensics_Presentation.pptx
Encapsulation_ Review paper, used for researhc scholars
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
cuic standard and advanced reporting.pdf
Network Security Unit 5.pdf for BCA BBA.
The AUB Centre for AI in Media Proposal.docx
Spectroscopy.pptx food analysis technology
Per capita expenditure prediction using model stacking based on satellite ima...
Digital-Transformation-Roadmap-for-Companies.pptx
MIND Revenue Release Quarter 2 2025 Press Release

[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use cases and new features coming in 1.1 by MariaDB Corporation - David Thompson

  • 1. MariaDB ColumnStore Use Cases and Upcoming 1.1 features. David Thompson VP Engineering @ MariaDB DB Tech Showcase Tokyo September 7th 2017
  • 2. What is MariaDB ColumnStore? High performance columnar storage engine that supports a wide variety of analytical use cases in highly scalable distributed environments Parallel query processing for distributed environments Faster, More Efficient Queries Single Interface for OLTP and analytics Easy to Manage and Scale Easier Enterprise Analytics Power of SQL and Freedom of Open Source to Big Data Analytics Better Price Performance
  • 3. Better Price Performance Flexible deployment option • Cloud and On-premise • Run on commodity hardware • Open Source, Subscription based pricing No need to maintain a third platform • Run analytics from the same SQL front end • No need to update application code • Leverage MariaDB Extensible architecture High data compression • More efficient at storing big data • Less hardware 90.3% less per TB per year Commercial Data Warehouse MariaDB ColumnStore
  • 4. Easier Enterprise Analytics ANSI SQL Single SQL Front-end • Use a single SQL interface for analytics and OLTP • Leverage MariaDB Security features - Encryption for data in motion , role based access and auditability Full ANSI SQL • No more SQL “like” query • Support complex join, aggregation and window function Easy to manage and scale • Eliminate needs for indexes and views • Automated horizontal/vertical partitioning • Linear scalable by adding new nodes as data grows • Out of box connection with BI tools
  • 5. Faster, More Efficient Queries Optimized for Columnar storage • Columnar storage reduces disk I/O • Blazing fast read-intensive workload • Ultra fast data import Parallel distributed query execution • Distributed queries into series of parallel operations • Fully parallel high speed data ingestion Highly available analytic environment • Built-in Redundancy • Automatic fail-over Parallel Query Processing
  • 6. MariaDB ColumnStore Architecture • Massively parallel architecture – Linear scalability as new nodes are added • Horizontal scaling – Add new data nodes as your data grows – Continue read queries when adding new nodes – Utilize MaxScale to load balance and provide single front end access point. Shared-Nothing Distributed Data Storage Compressed by default User Module (UM) Performance Module (PM) Data Storage MaxScaleMaxScale Load Balancer
  • 8. MariaDB ColumnStore Use Cases Financial Services Healthcare Telecommunications High Tech
  • 9. Financial Services Industry Industry Background • Every customer interaction generates electronic records • All transactions must be retained due to regulatory requirements • Customer centric marketing became more important due to fierce competition Why MariaDB ColumnStore - Cost effective solution to archive all transactional data securely for regulatory compliance - Fast data import from transactional database - Easy to analyze the archived data with SQL based analytics - Does not require DBA to index or partition data
  • 10. Financial Services Industry Use Cases Regulatory Compliance • Archive and retain historic transactional data Fraud Detection • Fraudulent or anomaly trade detection among millions of transactions per day • Proactively identify risks and prevent billions of loss due to fraud Trade Analytics • Analyze 20-30 million quotes per day • Identify trade patterns and predict the outcome
  • 11. Healthcare / Life Science Industry Industry Background • Electronic Medical Record (EMR) usage is increasing 48% annually • Increased adoption of big data for advanced research projects • Data protection and privacy regulations Why MariaDB ColumnStore - Strong security features including role based data access and audit plug in - MPP architecture handles analytics on big data with high speed - Easy to analyze archived data with SQL based analytics - Does not require DBA to index or partition data
  • 12. Healthcare / Life Science Industry Use Cases Genome analysis • In-depth genome research for the dairy industry to improve production of milk and protein. • Fast data load for large amount of genome dataset (DNA data for 7billion cows in US - 20GB per load) Healthcare spending analysis • Analyze 3TB of US health care spending for 155 conditions with 7 years of historical data • Used sankey diagram, treemap, and pyramid chart to analyze trends by age, sex, type of care, and condition Viral disease analysis • Used regional data with interactive map to identify Ebola disease spread • The map displays not only the existing transmission of Ebola virus, but also the probability of occurrence
  • 13. Visualization IHME Visualizations library: http://www.healthdata.org/results/data-visualizations
  • 14. Telecommunication Industry Industry Background • Extremely high digital traffic and bandwidth • Complex service offerings (4G, 5G, Wifi, IoT) • Customer centric / personalized service is critical due to low switching cost • High churn rate Why MariaDB ColumnStore - ColumnStore support time based partitioning and time-series analysis - Fast data load for real-time analytics - MPP architecture handles analytics on big data with high speed - Easy to analyze the archived data with SQL based analytics
  • 15. Telecommunication Industry Use Cases Customer behavior analysis • Analyze call data record to segment customers based on their behavior • Data-driven analysis for customer satisfaction • Create behavioral based upsell or cross-sell opportunity Network optimization • Combine network performance data with internal data (CDR) • Proactive services before the service is interrupted Call data analysis • Data size: 6TB • Ingest 1.5 million rows of logs per day with 30million texts and 3million calls • Call and network quality analysis • Provide higher quality customer services based on data
  • 16. High tech Industry Industry Background • High pressure to improve product quality and yield through various techniques (Six Sigma, JIT, Lean etc) • Explosion of data due to monitoring and sensor device innovations through IoT Why MariaDB ColumnStore - Identify patterns from massive dataset to improve yield - MPP architecture handles analytics on big data with high speed - Easy to analyze the archived data with SQL based analytics - Does not require DBA to index or partition data
  • 17. High tech Industry Use Cases Yield analysis and optimization • Run simulation to test the semiconductor quality • Chip designers utilize this test to improve the chip design and improve yield • 3,000 tests run in parallel that generate 5 million to 30 million data points Sensor Analytics • Import data from multiple IoT sensors • Run time series analysis to predict patterns and detect anomalies • Correlate multiple sensor informations to predict machine failure
  • 19. ColumnStore 1.1 • After five 1.0.x maintenance releases bringing improved stability, 1.1 brings some exciting new major features! • Some new components will be under LGPL and BSL licensing. Core ColumnStore engine and MariaDB server are GPL licensed. • Release Timeline: Q3 2017 Q4 2017 GA (Q4) Beta (Mid September) September October November December
  • 20. ColumnStore 1.1 Features Data Engine: Streaming / API : High Availability: Analytics: Data Types: Ease of Use: Performance: Security: Certifications: Columnar Storage engine based on MariaDB Server 10.2 Bulk import API to support programmatic and streaming writes Integrated GlusterFS support to provide storage HA for local disk User Defined Aggregate / Window Functions Text and Blob support Backup and Restore Tool Improved query and memory handling Audit Plugin integration Tableau certification
  • 21. Data Streaming: ColumnStore Data API What: • C++ API to directly write to PM nodes • LGPL licensed • Per table write • Input data is C++ data structure in API calls • Can run remotely from UM and PM servers Benefits: ● Real-time streaming directly into distributed data store ● No need to move large CSV data files to UM/PM ● Enable non-CSV data sources for columnstore ● Run outside UM/PM. Build custom ETL applications … PM Node Write Engine PM Node Write Engine PM Node Write Engine syslog Data Sources Data Streaming Application CS Data API Library
  • 22. ColumnStore Data Adapters 1.1 What ? • Pre-packaged data adapters written using CS data API • Convert from a specific data source into MariaDB ColumnStore • BSL licensed Benefits ● Out of box real time data streaming into CS ● No need to move large CSV data files to UM/PM ● Enable non-CSV data sources for columnstore ● Run outside UM/PM. Build custom ETL applications MaxScale CDC Adapter … PM Node Write Engine PM Node Write Engine PM Node Write Engine CS Data API Library MaxScale CDC API Avro Adapter CS Data API Library Kafka Consumer Interface MaxScale MDB Master
  • 23. User Defined Distributed Aggregates What • Enables creation of user defined functions for aggregates and window functions. 1.0 supports only user defined scalar functions. • Implemented using C++ SDK and allows map / reduce work breakdown between UM and PM nodes. Benefits • Enables custom optimized analytical functions. For example: – Sum of Squares ( Σ x2) – Median (distributed)
  • 24. What: • Enables auto-configuration of GlusterFS as storage filesystem for PM data. • Guided option during install, allows specification of data redundancy factor (2 or more) and automated layout of data brick locations. • If a PM node fails, then another node with a copy of the data block takes over. Benefits: ● Provide Data HA for on premise customers without network storage appliances. (Or cloud providers with low performing networked filesystems). Built-in Data Redundancy for Local Storage Data Block 1 Data Block 1 Copy Data Block 1 Copy Data Block 2 Data Block 3 Data Block 2 Copy Data Block 3 Copy Data Block 2 Copy Data Block 3 Copy PM 1 PM 2 PM 3 GlusterFS UM
  • 25. Where to find MariaDB ColumnStore? SOFTWARE DOWNLOAD https://mariadb.com/downloads/mariadb-ax SOURCE https://github.com/mariadb-corporation/mariadb-columnstore-engine DOCUMENTATION https://mariadb.com/kb/en/mariadb/mariadb-columnstore/ BLOGS https://mariadb.com/blog-tags/columnstore </>