SlideShare a Scribd company logo
ColumnStore Bulk Data
Adapters
David Thompson, VP Engineering
Jens Rowekamp, Engineer
Streamline and simplify
the process of data ingestion
Motivation
Organizations need to make data available for
analysis as soon as it arrives
Enable Machine learning results to be published and
accessible by business users through SQL Based
tools.
Ease of integration whether custom or ETL tools.
Bulk data adapters
Applications can use bulk data
adapters SDK to collect and
write data - on-demand data
loading
No need to copy CSV to
ColumnStore node -
simpler
Bypass SQL interface,
parser and optimizer -
faster writes
MariaDB Server
ColumnStore UM
Application
ColumnStore PM ColumnStore PMColumnStore PM
Write API Write API Write API
MariaDB Server
ColumnStore UM
Bulk Data Adapter
1. For each row
a. For each column
bulkInsert->setColumn
b. bulkInsert->writeRow
2. bulkInsert->commit
* Buffer 100,000 rows by default
Language Bindings
● The API is C++ 11 based.
● Currently available on modern Linux distributions:
○ May port to Windows and Mac in future release.
● Other language bindings are implemented using SWIG which generates
efficient almost identical native implementations on top of the C++ library:
○ Java 8 (also providing Scala support).
○ Python 2 & 3
○ Other language bindings can be implemented in the future.
System Configuration
● The adapter assumes the existence of a ColumnStore.xml file in the system in
order to determine the system topology, hosts, and ports for the PM nodes.
● If you are running on a ColumnStore node, the adapter will work
immediately.
● For a remote host, you will need to copy the ColumnStore.xml from a server
node.
● The adapter will need to be able to connect with the ProcMon (8800),
WriteEngine (8630), and DBRMController (8616) ports.
Core Classes
The following classes provide the core interface:
● ColumnStoreDriver : Entry point / connection management
● ColumnStoreBulkInsert: Per table interface for writing a transaction
● ColumnStoreSystemCatalog: Table metadata retrieval
Language namespaces:
● C++ - mcsapi::
● Java - com.mariadb.columnstore.api
● Python - pymcsapi
Core Classes - ColumnStoreDriver
● Entry point and factory class for creating:
○ ColumnStoreBulkInsert objects to allow bulk write of a single transaction for a single table
○ ColumnStoreSystemCatalog object to allow retrieval of table and column metadata
● Default constructor will look for ColumnStore.xml in:
○ $COLUMNSTORE_INSTALL_DIR/etc/ColumnStore.xml (for non root installs).
○ /usr/local/mariadb/columnstore/etc/ColumnStore.xml
● Alternatively pass path to ColumnStore.xml as constructor argument to
specify non standard location.
ColumnStoreDriver Examples
Java
import com.mariadb.columnstore.api.*;
..
ColumnStoreDriver d1, d2;
d1 = new ColumnStoreDriver();
d2 = new ColumnStoreDriver
("/etc/cs2.xml");
Python
import pymcsapi
d1 = pymcsapi.ColumnStoreDriver()
d2 = pymcsapi.ColumnStoreDriver
("/etc/cs2.xml");
C++
mcsapi::ColumnStoreDriver* d1, d2;
d1 = new mcsapi::ColumnStoreDriver();
d2 = new mcsapi::ColumnStoreDriver
("/etc/cs2.xml");
Core Classes - ColumnStoreBulkInsert
● Encapsulates bulk insert operations. Constructed for a single table and
transaction.
● Multiple instances can be created for multiple drivers but you can only have
one active per table per ColumnStore instance.
● Error handling is important, if you fail to commit or rollback a
ColumnStore table lock will be left which must be released manually with the
cleartablelock command.
○ resetRow can be used to clear the current row if an error occurs and you want to commit the
prior rows.
● After completion getSummary returns some summary details.
ColumnStoreBulkInsert Examples
Java
import com.mariadb.columnstore.api.*;
..
ColumnStoreDriver d;
ColumnStoreBulkInsert b;
d = new ColumnStoreDriver();
try {
b = d.createBulkInsert("test", "t1",
(short)0, 0);
b.setColumn(0, 1);
b.setColumn(1, "ABC");
b.writeRow();
b.setColumn(0,2);
b.setColumn(1, "DEF");
b.writeRow();
b.commit();
} catch (ColumnStoreException e) {
b.rollback();
..
}
Python
import pymcsapi
d = pymcsapi.ColumnStoreDriver()
try:
b = d.createBulkInsert("test", "t1",
0, 0);
b.setColumn(0, 1);
b.setColumn(1, "ABC");
b.writeRow();
b.setColumn(0,2);
b.setColumn(1, "DEF");
b.writeRow();
b.commit();
except RuntimeError as err:
b.rollback()
C++
mcsapi::ColumnStoreDriver* d;
mcsapi::ColumnStoreBulkInsert* b;
d = new mcsapi::ColumnStoreDriver();
try {
b = d->createBulkInsert("test", "t1",
0, 0);
b->setColumn(0, (uint32_t)1);
b->setColumn(1, "ABC");
b->writeRow();
b->setColumn(0, (uint32_t)2);
b->setColumn(1, "DEF");
b->writeRow();
b->commit();
} catch (mcsapi::ColumnStoreError &e) {
b->rollback();
..
}
Core Classes - ColumnStoreSystemCatalog
● Allow retrieval of ColumnStore table and column metadata to allow for
generic implementations.
ColumnStoreSystemCatalog Examples
Java
import com.mariadb.columnstore.api.*;
..
ColumnStoreDriver d;
ColumnStoreSystemCatalog c;
ColumnStoreSystemCatalogTable t;
ColumnStoreSystemCatalogColumn c1,c2;
d = new ColumnStoreDriver();
c = d.getSystemCatalog();
t = c.getTable("test", "t1");
int t1_cols = c.getColumnCount();
c1 = t.getColumn(0);
String c1_name = c1.getColumnName();
c2 = t.getColumn("area_code");
Python
import pymcsapi
d = pymcsapi.ColumnStoreDriver()
c = d.getSystemCatalog()
t = c.getTable("test", "t1");
t1_cols = t.getColumnCount();
c1 = t.getColumn(0);
c1_name = c1.getColumnName();
c2 = t.getColumn("area_code");
C++
mcsapi::ColumnStoreDriver* d;
mcsapi::ColumnStoreSystemCatalog c;
mcsapi::ColumnStoreSystemCatalogTable t;
mcsapi::ColumnStoreSystemCatalogColumn c1,c2;
d = new mcsapi::ColumnStoreDriver();
c = d->getSystemCatalog();
t = c.getTable("test", "t1");
uint16_t t1_cols = c.getColumnCount();
c1 = t.getColumn(0);
std:string c1_name = c1.getColumnName();
c2 = t.getColumn("area_code");
Core Classes - Bulk Insert
ColumnStoreDriver
char* getVersion()
ColumnStoreBulkInsert* createBulkInsert(..)
ColumnStoreSystemCatalog& getSystemCatalog()
ColumnStoreBulkInsert
uint16_t getColumnCount()
ColumnStoreBulkInsert* writeRow()
ColumnStoreBulkInsert* resetRow()
void commit()
void rollback()
ColumnStoreSummary& getSummary()
void setTruncateIsError(bool)
void setBatchSize(uint32_t)
bool isActive()
ColumnStoreBulkInsert* setColumn(uint16_t,
const std::string& value,..)
ColumnStoreBulkInsert* setColumn(uint16_t, uint64_t,..)
..
ColumnStoreSummary
double getExecutionTime()
uint64_t getRowsInsertedCount()
uint64_t getTruncationCount()
uint64_t getSaturatedCount()
uint64_t getInvalidCount()
1 1
0..*
1
ColumnStoreDateTime
ColumnStoreDateTime(..)
bool set(..)
ColumnStoreDecimal
ColumnStoreDecimal(..)
bool set(..)
Core Classes - System Catalog
ColumnStoreSystemCatalogColumn
uint32_t getOID()
const std::string& getColumnName()
uint32_t getDictionaryOID()
columnstore_data_types_t getType()
uint32_t getWidth()
uint32_t getPosition()
const std::string& getDefaultValue()
bool isAutoincrement()
uint32_t getPrecision()
uint32_t getScale()
bool isNullable()
uint8_t compressionType()
ColumnStoreSystemCatalogTable
const std::string& getSchemaName()
const std::string& getTableName()
uint32_t getOID()
uint16_t getColumnCount()
ColumnStoreSystemCatalogColumn& getColumn(const std::string&)
ColumnStoreSystemCatalogColumn& getColumn(uint16_t)
ColumnStoreDriver
char* getVersion()
ColumnStoreBulkInsert* createBulkInsert(..)
ColumnStoreSystemCatalog& getSystemCatalog()
ColumnStoreSystemCatalog
ColumnStoreSystemCatalogTable& getTable(const
std::string& schemaName, const std::string& tableName)
1
1
1
1
0..*
1..*
Use Cases
The Bulk Data adapters are designed to be used to more easily enable
integrations and streaming use cases such as:
- Kafka or Messaging integration
- Exposing data import via an API
- ETL Tool adapters.
- Custom ETL logic.
MariaDB has introduced a few specific streaming adapters (MaxScale CDC and
Kafka) and we plan to build more in the future. For further details please attend
tomorrow's session "Real-time Analytics With The New Streaming Data Adapters" at
8.40am.
Spark Connector
● Enables publishing of machine learning results from Spark DataFrames to
ColumnStore.
● Enable best of breed approach:
○ In memory machine learning algorithms in Spark
○ Publish results to ColumnStore for ease of consumption with SQL tools such as Tableau.
● Supports both Scala and Python notebooks.
● To pull data from ColumnStore into Spark use the JDBC connector and Spark
SQL to read data.
○ In the future we plan to add a bulk read api.
● Requires configuring additional jar files to Spark runtime configuration.
● Available as a Docker image for reference / easy evaluation.
Spark Connector Demo / Example
Spark Connector - Getting Started with Docker
git clone https://github.com/mariadb-corporation/mariadb-columnstore-docker.git
cd mariadb-columnstore-docker/columnstore_jupyter
docker-compose up -d
In your browser open http://localhost:8888 and enter 'mariadb' as the password to login to the jupyter notebook
application.
Thank you!

More Related Content

PDF
What's New In JDK 10
PDF
Benchx: An XQuery benchmarking web application
PPTX
Cs267 hadoop programming
PPTX
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
PPT
Benedutch 2011 ew_ppt
ODP
BaseX user-group-talk XML Prague 2013
PPTX
Nov. 4, 2011 o reilly webcast-hbase- lars george
PPTX
Scott Anderson [InfluxData] | InfluxDB Tasks – Beyond Downsampling | InfluxDa...
What's New In JDK 10
Benchx: An XQuery benchmarking web application
Cs267 hadoop programming
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Benedutch 2011 ew_ppt
BaseX user-group-talk XML Prague 2013
Nov. 4, 2011 o reilly webcast-hbase- lars george
Scott Anderson [InfluxData] | InfluxDB Tasks – Beyond Downsampling | InfluxDa...

What's hot (20)

PDF
Flux and InfluxDB 2.0 by Paul Dix
PDF
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
PDF
Spark 4th Meetup Londond - Building a Product with Spark
PDF
A Deeper Dive into EXPLAIN
 
PDF
Scala+data
PPTX
Hadoop training-in-hyderabad
PDF
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
PDF
Interactive Session on Sparkling Water
PPTX
Shrug2017 arcpy data_and_you
PPTX
EPAM. Hadoop MR streaming in Hive
PDF
Mixing Metrics and Logs with Grafana + Influx by David Kaltschmidt, Director ...
PPT
OWB11gR2 - Extending ETL
PPTX
WPF and Prism 4.1 Workshop at BASTA Austria
PDF
Streaming SQL
PDF
CNIT 127 Ch 5: Introduction to heap overflows
PPTX
The CoFX Data Model
PDF
Time Series Meetup: Virtual Edition | July 2020
PPT
Spring data ii
PPTX
Hadoop and HBase experiences in perf log project
PPTX
Barbara Nelson [InfluxData] | How Can I Put That Dashboard in My App? | Influ...
Flux and InfluxDB 2.0 by Paul Dix
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
Spark 4th Meetup Londond - Building a Product with Spark
A Deeper Dive into EXPLAIN
 
Scala+data
Hadoop training-in-hyderabad
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Interactive Session on Sparkling Water
Shrug2017 arcpy data_and_you
EPAM. Hadoop MR streaming in Hive
Mixing Metrics and Logs with Grafana + Influx by David Kaltschmidt, Director ...
OWB11gR2 - Extending ETL
WPF and Prism 4.1 Workshop at BASTA Austria
Streaming SQL
CNIT 127 Ch 5: Introduction to heap overflows
The CoFX Data Model
Time Series Meetup: Virtual Edition | July 2020
Spring data ii
Hadoop and HBase experiences in perf log project
Barbara Nelson [InfluxData] | How Can I Put That Dashboard in My App? | Influ...
Ad

Similar to M|18 Ingesting Data with the New Bulk Data Adapters (20)

PDF
How to make data available for analytics ASAP
PPT
Sqlapi0.1
PDF
Machinelearning Spark Hadoop User Group Munich Meetup 2016
ODP
Introduction to Structured Streaming
PDF
KSQL - Stream Processing simplified!
ODP
Developingapiplug insforcs-151112204727-lva1-app6891
PPTX
Google cloud Dataflow & Apache Flink
PDF
SamzaSQL QCon'16 presentation
PPT
JDBC for CSQL Database
PDF
Distributed Realtime Computation using Apache Storm
PPTX
Apache cassandra - future without boundaries (part3)
PPT
Express 070 536
PDF
What’s new in MariaDB ColumnStore
PDF
ARM Embeded_Firmware.pdf
PPT
Accelerated data access
PDF
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
PDF
Better Open Source Enterprise C++ Web Services
PDF
A New Chapter of Data Processing with CDK
PPTX
Learning spark ch10 - Spark Streaming
PPTX
Lambdas puzzler - Peter Lawrey
How to make data available for analytics ASAP
Sqlapi0.1
Machinelearning Spark Hadoop User Group Munich Meetup 2016
Introduction to Structured Streaming
KSQL - Stream Processing simplified!
Developingapiplug insforcs-151112204727-lva1-app6891
Google cloud Dataflow & Apache Flink
SamzaSQL QCon'16 presentation
JDBC for CSQL Database
Distributed Realtime Computation using Apache Storm
Apache cassandra - future without boundaries (part3)
Express 070 536
What’s new in MariaDB ColumnStore
ARM Embeded_Firmware.pdf
Accelerated data access
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
Better Open Source Enterprise C++ Web Services
A New Chapter of Data Processing with CDK
Learning spark ch10 - Spark Streaming
Lambdas puzzler - Peter Lawrey
Ad

More from MariaDB plc (20)

PDF
MariaDB Berlin Roadshow Slides - 8 April 2025
PDF
MariaDB München Roadshow - 24 September, 2024
PDF
MariaDB Paris Roadshow - 19 September 2024
PDF
MariaDB Amsterdam Roadshow: 19 September, 2024
PDF
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
PDF
MariaDB Paris Workshop 2023 - Newpharma
PDF
MariaDB Paris Workshop 2023 - Cloud
PDF
MariaDB Paris Workshop 2023 - MariaDB Enterprise
PDF
MariaDB Paris Workshop 2023 - Performance Optimization
PDF
MariaDB Paris Workshop 2023 - MaxScale
PDF
MariaDB Paris Workshop 2023 - novadys presentation
PDF
MariaDB Paris Workshop 2023 - DARVA presentation
PDF
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
PDF
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
PDF
Einführung : MariaDB Tech und Business Update Hamburg 2023
PDF
Hochverfügbarkeitslösungen mit MariaDB
PDF
Die Neuheiten in MariaDB Enterprise Server
PDF
Global Data Replication with Galera for Ansell Guardian®
PDF
Introducing workload analysis
PDF
Under the hood: SkySQL monitoring
MariaDB Berlin Roadshow Slides - 8 April 2025
MariaDB München Roadshow - 24 September, 2024
MariaDB Paris Roadshow - 19 September 2024
MariaDB Amsterdam Roadshow: 19 September, 2024
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
Einführung : MariaDB Tech und Business Update Hamburg 2023
Hochverfügbarkeitslösungen mit MariaDB
Die Neuheiten in MariaDB Enterprise Server
Global Data Replication with Galera for Ansell Guardian®
Introducing workload analysis
Under the hood: SkySQL monitoring

Recently uploaded (20)

PDF
Fluorescence-microscope_Botany_detailed content
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Mega Projects Data Mega Projects Data
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Computer network topology notes for revision
PDF
Foundation of Data Science unit number two notes
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Introduction to Knowledge Engineering Part 1
PPT
Quality review (1)_presentation of this 21
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
Fluorescence-microscope_Botany_detailed content
IBA_Chapter_11_Slides_Final_Accessible.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
ISS -ESG Data flows What is ESG and HowHow
Mega Projects Data Mega Projects Data
Acceptance and paychological effects of mandatory extra coach I classes.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Business Acumen Training GuidePresentation.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Computer network topology notes for revision
Foundation of Data Science unit number two notes
Galatica Smart Energy Infrastructure Startup Pitch Deck
Introduction to Knowledge Engineering Part 1
Quality review (1)_presentation of this 21
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Reliability_Chapter_ presentation 1221.5784

M|18 Ingesting Data with the New Bulk Data Adapters

  • 1. ColumnStore Bulk Data Adapters David Thompson, VP Engineering Jens Rowekamp, Engineer
  • 2. Streamline and simplify the process of data ingestion
  • 3. Motivation Organizations need to make data available for analysis as soon as it arrives Enable Machine learning results to be published and accessible by business users through SQL Based tools. Ease of integration whether custom or ETL tools.
  • 4. Bulk data adapters Applications can use bulk data adapters SDK to collect and write data - on-demand data loading No need to copy CSV to ColumnStore node - simpler Bypass SQL interface, parser and optimizer - faster writes MariaDB Server ColumnStore UM Application ColumnStore PM ColumnStore PMColumnStore PM Write API Write API Write API MariaDB Server ColumnStore UM Bulk Data Adapter 1. For each row a. For each column bulkInsert->setColumn b. bulkInsert->writeRow 2. bulkInsert->commit * Buffer 100,000 rows by default
  • 5. Language Bindings ● The API is C++ 11 based. ● Currently available on modern Linux distributions: ○ May port to Windows and Mac in future release. ● Other language bindings are implemented using SWIG which generates efficient almost identical native implementations on top of the C++ library: ○ Java 8 (also providing Scala support). ○ Python 2 & 3 ○ Other language bindings can be implemented in the future.
  • 6. System Configuration ● The adapter assumes the existence of a ColumnStore.xml file in the system in order to determine the system topology, hosts, and ports for the PM nodes. ● If you are running on a ColumnStore node, the adapter will work immediately. ● For a remote host, you will need to copy the ColumnStore.xml from a server node. ● The adapter will need to be able to connect with the ProcMon (8800), WriteEngine (8630), and DBRMController (8616) ports.
  • 7. Core Classes The following classes provide the core interface: ● ColumnStoreDriver : Entry point / connection management ● ColumnStoreBulkInsert: Per table interface for writing a transaction ● ColumnStoreSystemCatalog: Table metadata retrieval Language namespaces: ● C++ - mcsapi:: ● Java - com.mariadb.columnstore.api ● Python - pymcsapi
  • 8. Core Classes - ColumnStoreDriver ● Entry point and factory class for creating: ○ ColumnStoreBulkInsert objects to allow bulk write of a single transaction for a single table ○ ColumnStoreSystemCatalog object to allow retrieval of table and column metadata ● Default constructor will look for ColumnStore.xml in: ○ $COLUMNSTORE_INSTALL_DIR/etc/ColumnStore.xml (for non root installs). ○ /usr/local/mariadb/columnstore/etc/ColumnStore.xml ● Alternatively pass path to ColumnStore.xml as constructor argument to specify non standard location.
  • 9. ColumnStoreDriver Examples Java import com.mariadb.columnstore.api.*; .. ColumnStoreDriver d1, d2; d1 = new ColumnStoreDriver(); d2 = new ColumnStoreDriver ("/etc/cs2.xml"); Python import pymcsapi d1 = pymcsapi.ColumnStoreDriver() d2 = pymcsapi.ColumnStoreDriver ("/etc/cs2.xml"); C++ mcsapi::ColumnStoreDriver* d1, d2; d1 = new mcsapi::ColumnStoreDriver(); d2 = new mcsapi::ColumnStoreDriver ("/etc/cs2.xml");
  • 10. Core Classes - ColumnStoreBulkInsert ● Encapsulates bulk insert operations. Constructed for a single table and transaction. ● Multiple instances can be created for multiple drivers but you can only have one active per table per ColumnStore instance. ● Error handling is important, if you fail to commit or rollback a ColumnStore table lock will be left which must be released manually with the cleartablelock command. ○ resetRow can be used to clear the current row if an error occurs and you want to commit the prior rows. ● After completion getSummary returns some summary details.
  • 11. ColumnStoreBulkInsert Examples Java import com.mariadb.columnstore.api.*; .. ColumnStoreDriver d; ColumnStoreBulkInsert b; d = new ColumnStoreDriver(); try { b = d.createBulkInsert("test", "t1", (short)0, 0); b.setColumn(0, 1); b.setColumn(1, "ABC"); b.writeRow(); b.setColumn(0,2); b.setColumn(1, "DEF"); b.writeRow(); b.commit(); } catch (ColumnStoreException e) { b.rollback(); .. } Python import pymcsapi d = pymcsapi.ColumnStoreDriver() try: b = d.createBulkInsert("test", "t1", 0, 0); b.setColumn(0, 1); b.setColumn(1, "ABC"); b.writeRow(); b.setColumn(0,2); b.setColumn(1, "DEF"); b.writeRow(); b.commit(); except RuntimeError as err: b.rollback() C++ mcsapi::ColumnStoreDriver* d; mcsapi::ColumnStoreBulkInsert* b; d = new mcsapi::ColumnStoreDriver(); try { b = d->createBulkInsert("test", "t1", 0, 0); b->setColumn(0, (uint32_t)1); b->setColumn(1, "ABC"); b->writeRow(); b->setColumn(0, (uint32_t)2); b->setColumn(1, "DEF"); b->writeRow(); b->commit(); } catch (mcsapi::ColumnStoreError &e) { b->rollback(); .. }
  • 12. Core Classes - ColumnStoreSystemCatalog ● Allow retrieval of ColumnStore table and column metadata to allow for generic implementations.
  • 13. ColumnStoreSystemCatalog Examples Java import com.mariadb.columnstore.api.*; .. ColumnStoreDriver d; ColumnStoreSystemCatalog c; ColumnStoreSystemCatalogTable t; ColumnStoreSystemCatalogColumn c1,c2; d = new ColumnStoreDriver(); c = d.getSystemCatalog(); t = c.getTable("test", "t1"); int t1_cols = c.getColumnCount(); c1 = t.getColumn(0); String c1_name = c1.getColumnName(); c2 = t.getColumn("area_code"); Python import pymcsapi d = pymcsapi.ColumnStoreDriver() c = d.getSystemCatalog() t = c.getTable("test", "t1"); t1_cols = t.getColumnCount(); c1 = t.getColumn(0); c1_name = c1.getColumnName(); c2 = t.getColumn("area_code"); C++ mcsapi::ColumnStoreDriver* d; mcsapi::ColumnStoreSystemCatalog c; mcsapi::ColumnStoreSystemCatalogTable t; mcsapi::ColumnStoreSystemCatalogColumn c1,c2; d = new mcsapi::ColumnStoreDriver(); c = d->getSystemCatalog(); t = c.getTable("test", "t1"); uint16_t t1_cols = c.getColumnCount(); c1 = t.getColumn(0); std:string c1_name = c1.getColumnName(); c2 = t.getColumn("area_code");
  • 14. Core Classes - Bulk Insert ColumnStoreDriver char* getVersion() ColumnStoreBulkInsert* createBulkInsert(..) ColumnStoreSystemCatalog& getSystemCatalog() ColumnStoreBulkInsert uint16_t getColumnCount() ColumnStoreBulkInsert* writeRow() ColumnStoreBulkInsert* resetRow() void commit() void rollback() ColumnStoreSummary& getSummary() void setTruncateIsError(bool) void setBatchSize(uint32_t) bool isActive() ColumnStoreBulkInsert* setColumn(uint16_t, const std::string& value,..) ColumnStoreBulkInsert* setColumn(uint16_t, uint64_t,..) .. ColumnStoreSummary double getExecutionTime() uint64_t getRowsInsertedCount() uint64_t getTruncationCount() uint64_t getSaturatedCount() uint64_t getInvalidCount() 1 1 0..* 1 ColumnStoreDateTime ColumnStoreDateTime(..) bool set(..) ColumnStoreDecimal ColumnStoreDecimal(..) bool set(..)
  • 15. Core Classes - System Catalog ColumnStoreSystemCatalogColumn uint32_t getOID() const std::string& getColumnName() uint32_t getDictionaryOID() columnstore_data_types_t getType() uint32_t getWidth() uint32_t getPosition() const std::string& getDefaultValue() bool isAutoincrement() uint32_t getPrecision() uint32_t getScale() bool isNullable() uint8_t compressionType() ColumnStoreSystemCatalogTable const std::string& getSchemaName() const std::string& getTableName() uint32_t getOID() uint16_t getColumnCount() ColumnStoreSystemCatalogColumn& getColumn(const std::string&) ColumnStoreSystemCatalogColumn& getColumn(uint16_t) ColumnStoreDriver char* getVersion() ColumnStoreBulkInsert* createBulkInsert(..) ColumnStoreSystemCatalog& getSystemCatalog() ColumnStoreSystemCatalog ColumnStoreSystemCatalogTable& getTable(const std::string& schemaName, const std::string& tableName) 1 1 1 1 0..* 1..*
  • 16. Use Cases The Bulk Data adapters are designed to be used to more easily enable integrations and streaming use cases such as: - Kafka or Messaging integration - Exposing data import via an API - ETL Tool adapters. - Custom ETL logic. MariaDB has introduced a few specific streaming adapters (MaxScale CDC and Kafka) and we plan to build more in the future. For further details please attend tomorrow's session "Real-time Analytics With The New Streaming Data Adapters" at 8.40am.
  • 17. Spark Connector ● Enables publishing of machine learning results from Spark DataFrames to ColumnStore. ● Enable best of breed approach: ○ In memory machine learning algorithms in Spark ○ Publish results to ColumnStore for ease of consumption with SQL tools such as Tableau. ● Supports both Scala and Python notebooks. ● To pull data from ColumnStore into Spark use the JDBC connector and Spark SQL to read data. ○ In the future we plan to add a bulk read api. ● Requires configuring additional jar files to Spark runtime configuration. ● Available as a Docker image for reference / easy evaluation.
  • 18. Spark Connector Demo / Example
  • 19. Spark Connector - Getting Started with Docker git clone https://github.com/mariadb-corporation/mariadb-columnstore-docker.git cd mariadb-columnstore-docker/columnstore_jupyter docker-compose up -d In your browser open http://localhost:8888 and enter 'mariadb' as the password to login to the jupyter notebook application.