SlideShare a Scribd company logo
Public
Column Oriented Databases
Arundhati Kanungo, Developer Associate, SAP Labs India Pvt. Ltd.
June 25, 2017
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 2Public
Agenda
• Definition of Column Oriented Databases
• History of Column Oriented Databases
• Working of Column Oriented Databases
• Top 3 Market Selling Column Oriented Databases
• Advantages of Column Oriented Databases
• Disadvantages of Column Oriented Databases
• Row vs Column Oriented Databases
• Industries to benefit from Column Oriented Databases
• Conclusion
• Future Work
Column Oriented Databases -
Definition
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 4Public
• Database management systems that store table data as columns of data rather than as rows
of data
• Have advantages for data warehousing, customer relationship management (CRM), and
library card catalogs, and other ad hoc enquiry where aggregates are computed over large
numbers of similar data items
• Refer to both the ease of expressing a column oriented structure and the focus on
optimizations for column oriented workloads
• Common examples include Greenplum Database, Calpont InfiniDB, Accumulo, Teradata,
SenSage, EXASOL, MonetDB, RCFile, Sqrrl, etc.
Column Oriented Databases -
History
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 6Public
• 1969
• With focus on information-retrieval in biology, the first application of a column oriented database storage
system called TAXIR was developed.
• 1976
• Statistics Canada implemented the RAPID system for processing and retrieval of the Canadian Census of
Population and Housing as well as several other statistical applications.
• 1977 - 1990
• The RAPID system was shared with other statistical organizations throughout the world for widespread use in
the 1980s. It was used by Statistics Canada until the 1990s.
• 1993
• KDB was launched as the first commercially available column oriented database.
• 1995
• Sybase IQ gained prominence as the second column oriented database.
• 2005
• The traditional column oriented databases have changed rapidly since 2005 with many open source and
commercial implementations.
Column Oriented Databases -
Working
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 8Public
EmpId LastName FirstName Salary
10 Smith Joe 40000
12 Jones Mary 50000
11 Johnson Cathy 44000
22 Jones Bob 55000
Ensuring minimum seeks, a column oriented database serializes all values of a column
together, then the values of the next column, and so on. For our example table, the data would
be stored as below.
10:001,12:002,11:003,22:004;Smith:001,Jones:002,Johnson:003,Jones:004;Joe:001,Mary:002,
Cathy:003,Bob:004;40000:001,50000:002,44000:003,55000:004;
A column-oriented store "is really just" a row-store with an index on every column.
Top 3 Column Oriented
Databases
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 10Public
• Sybase
• Aims to deliver high end performance for critical analytics, business intelligence and data warehousing solutions
leveraging highly optimized server dedicated for analytics.
• Is column oriented with grid-based architecture, patented data compression, and advanced query optimizer, and
henceforth, delivers high performance, flexibility, and economy in challenging reporting and analytics
environments.
• Infobright
• Combination of a column oriented database with their Knowledge Grid architecture delivers a self-managed,
scalable, high performance analytics query platform.
• Industry-leading data compression (10:1 up to 40:1) considerably lowers storage requirements and expensive
hardware infrastructures.
• Vertica
• Purpose-built platform to enable data values having high performance real time analytics needs.
• Highly scalable with a seamless integration ecosystem leveraging capabilities like data loading, queries, columnar
storage, MPP architecture, and data compression features.
Column Oriented Databases –
Advantages
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 12Public
• Scalability and fast data loading for Big Data
• Accessible by many third-party BI analytic tools
• Simple systems administration
• High performance on aggregation queries (like COUNT, SUM, AVG, MIN, MAX)
• Highly efficient data compression and/or partitioning
Column Oriented Databases –
Disadvantages
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 14Public
• Record updates and deletes reduce storage efficiency
• Effective partitioning/indexing schemes can be difficult to design
• Transactions are to be avoided or just not supported
• Queries with table joins can reduce high performance
Row vs Column Oriented
Databases
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 16Public
Comparative Analysis
Row Oriented Databases
• Relatively less efficient in terms of aggregate computation,
multiple updates, multiple select, single insert
• Well-suited for OLTP (On-Line Transaction Processing) like
workloads
Column Oriented Databases
• Highly efficient in terms of aggregate computation, multiple
updates, multiple select, single insert
• Well-suited for OLAP (On-Line Analytical Processing) like
workloads
Column Oriented Databases –
Benefited Industries
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 18Public
• Telecommunications: Helps in improving response time to the customer by reducing input
and output
• Financial Services: Supports high performance, millisecond response time to queries
required for inbound market
• Retail: Reads the data only referenced in question driving higher performance and lowering
processing costs compared to reading all the columns in the table
Conclusion
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 20Public
Column oriented database is a key technology that delivers high business value by helping
enterprises adapt their information infrastructure to the evolving demands for timely, reliable
intelligence to run the business. In addition, it has far-reaching implications for the design of
systems, and offers major cost savings affecting higher power and cooling requirements.
Future Work
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 22Public
Columnar databases can be very helpful in big data project. Big data, today is one of the
biggest problem ever faced. When we have volume and variety of random real-time data, we
might want to use a columnar database to exploit its flexibility, performance and scalability. Till
date, HBase is the only column oriented database that is used with big data. I look forward to
carry out a comparative analysis of HBase performance and other columnar database
performance when fed with big data.
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 23Public
References
1. Stonebraker et al., “C-Store: A Column-oriented DBMS”, Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005
2. Copeland, George P. and Khoshafian, Setrag N., “A Decomposition Storage Model”, SIGMOD '85, 1985
3. Daniel Abadi and Samuel Madden, "Debunking Another Myth: Column-Stores vs. Vertical Partitioning", The Database Column, 31 July 2008
4. Stavros Harizopoulos, Daniel Abadi and Peter Boncz, "Column-Oriented Database Systems", VLDB 2009 Tutorial, p. 5
5. Pat & Betty O’Neil, Xuedong Chen and Stephen Revilak, “The Star Schema Benchmark and Augmented Fact Table Indexing”, TPC Technology
Conference 8/24/09
6. D. J. Abadi, S. R. Madden, N. Hachem, “Column-stores vs. Row-stores: How Different are They Really?”, in: SIGMOD’08, 2008, pp. 967–980.
7. N. Bruno, “Teaching an old elephant new tricks”, in: CIDR ’09, 2009.
8. Daniel Lemire, Owen Kaser, Kamel Aouiche, "Sorting Improves Word-Aligned Bitmap Indexes", Data & Knowledge Engineering, Volume 69,
Issue 1 (2010), pp. 3-28.
9. Daniel Lemire and Owen Kaser, “Reordering Columns for Smaller Indexes”, Information Sciences 181 (12), 2011
10. Slezak et al., Brighthouse: “An Analytic Data Warehouse for Ad hoc Queries”, Proceedings of the 34th VLDB Conference, Auckland, New
Zealand, 2008
11. Estabrook, Brill, “The Theory of the TAXIR Accessioned”, Mathematical Biosciences, Volume 5, Issues 3–4, Elsevier B.V.
12. Turner, Hammond, Cotton, “A DBMS for Large Statistical Databases”, Proceedings of VLDB 1979, Rio de Janeiro, Brazil
© 2016 SAP SE or an SAP affiliate company. All rights reserved.
Thank you Contact information:
Arundhati Kanungo
Developer Associate
SAP Labs India Pvt. Ltd.
Arundhati.Kanungo@sap.com
+91 7406313166
© 2016 SAP SE or an SAP affiliate company. All rights reserved.
Any Questions???

More Related Content

PDF
Introduction to column oriented databases
PPTX
Introduction to Data Engineering
PDF
Data Warehouse Best Practices
PPTX
Microsoft Azure Databricks
PDF
Non Relational Databases
PPTX
Microsoft Fabric.pptx
PPT
8. column oriented databases
Introduction to column oriented databases
Introduction to Data Engineering
Data Warehouse Best Practices
Microsoft Azure Databricks
Non Relational Databases
Microsoft Fabric.pptx
8. column oriented databases

What's hot (20)

PPTX
Document Database
PDF
Building Lakehouses on Delta Lake with SQL Analytics Primer
PPTX
Migrating on premises workload to azure sql database
PPT
9. Document Oriented Databases
PPTX
Introduction to NoSQL
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
PPTX
NOSQL vs SQL
PDF
DDBMS Paper with Solution
PPTX
Data Engineering A Deep Dive into Databricks
PPTX
NoSQL databases - An introduction
PPTX
Column oriented database
PPTX
Non relational databases-no sql
PPTX
Cassandra
PDF
Cassandra at eBay - Cassandra Summit 2012
PPTX
Big Data Fundamentals
ZIP
NoSQL databases
PPTX
Building an Effective Data Warehouse Architecture
PDF
Modern Data Architecture
PPTX
Azure Synapse Analytics Overview (r2)
Document Database
Building Lakehouses on Delta Lake with SQL Analytics Primer
Migrating on premises workload to azure sql database
9. Document Oriented Databases
Introduction to NoSQL
Data Lakehouse, Data Mesh, and Data Fabric (r1)
NOSQL vs SQL
DDBMS Paper with Solution
Data Engineering A Deep Dive into Databricks
NoSQL databases - An introduction
Column oriented database
Non relational databases-no sql
Cassandra
Cassandra at eBay - Cassandra Summit 2012
Big Data Fundamentals
NoSQL databases
Building an Effective Data Warehouse Architecture
Modern Data Architecture
Azure Synapse Analytics Overview (r2)
Ad

Similar to Column Oriented Databases (20)

PPTX
DATA WAREHOUSING
PDF
Data warehousing
PPTX
SoftServe BI/BigData Workshop in Utah
PPTX
Data warehousing Concepts and Design.pptx
PPTX
introduction & conceptsdatawarehousing.pptx
PPTX
Building a Big Data Solution
PPTX
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
PDF
IBM Data Analytics Module 2 Overview of data Repositories.
PDF
QUERY OPTIMIZATION FOR BIG DATA ANALYTICS
PDF
Query Optimization for Big Data Analytics
PDF
CIO Guide to Using SAP HANA Platform For Big Data
PDF
A beginners guide to Cloudera Hadoop
PPTX
How to Empower Your Business Users with Oracle Data Visualization
PDF
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
 
PPT
Nw2008 tips tricks_edw_v10
PDF
Total Data Industry Report
PPTX
IARE_BDBA_ PPT_0.pptx
PPT
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
PPTX
Transform your DBMS to drive engagement innovation with Big Data
PDF
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
DATA WAREHOUSING
Data warehousing
SoftServe BI/BigData Workshop in Utah
Data warehousing Concepts and Design.pptx
introduction & conceptsdatawarehousing.pptx
Building a Big Data Solution
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
IBM Data Analytics Module 2 Overview of data Repositories.
QUERY OPTIMIZATION FOR BIG DATA ANALYTICS
Query Optimization for Big Data Analytics
CIO Guide to Using SAP HANA Platform For Big Data
A beginners guide to Cloudera Hadoop
How to Empower Your Business Users with Oracle Data Visualization
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
 
Nw2008 tips tricks_edw_v10
Total Data Industry Report
IARE_BDBA_ PPT_0.pptx
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
Transform your DBMS to drive engagement innovation with Big Data
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Ad

Recently uploaded (20)

PPTX
L1 - Introduction to python Backend.pptx
PDF
Designing Intelligence for the Shop Floor.pdf
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
17 Powerful Integrations Your Next-Gen MLM Software Needs
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
Patient Appointment Booking in Odoo with online payment
PDF
Digital Systems & Binary Numbers (comprehensive )
PPTX
Advanced SystemCare Ultimate Crack + Portable (2025)
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Cost to Outsource Software Development in 2025
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
Oracle Fusion HCM Cloud Demo for Beginners
PDF
iTop VPN Free 5.6.0.5262 Crack latest version 2025
PDF
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
PDF
Tally Prime Crack Download New Version 5.1 [2025] (License Key Free
PPTX
Transform Your Business with a Software ERP System
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
iTop VPN 6.5.0 Crack + License Key 2025 (Premium Version)
L1 - Introduction to python Backend.pptx
Designing Intelligence for the Shop Floor.pdf
CHAPTER 2 - PM Management and IT Context
17 Powerful Integrations Your Next-Gen MLM Software Needs
How to Choose the Right IT Partner for Your Business in Malaysia
Patient Appointment Booking in Odoo with online payment
Digital Systems & Binary Numbers (comprehensive )
Advanced SystemCare Ultimate Crack + Portable (2025)
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Cost to Outsource Software Development in 2025
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Navsoft: AI-Powered Business Solutions & Custom Software Development
Design an Analysis of Algorithms II-SECS-1021-03
Oracle Fusion HCM Cloud Demo for Beginners
iTop VPN Free 5.6.0.5262 Crack latest version 2025
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
Tally Prime Crack Download New Version 5.1 [2025] (License Key Free
Transform Your Business with a Software ERP System
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
iTop VPN 6.5.0 Crack + License Key 2025 (Premium Version)

Column Oriented Databases

  • 1. Public Column Oriented Databases Arundhati Kanungo, Developer Associate, SAP Labs India Pvt. Ltd. June 25, 2017
  • 2. © 2016 SAP SE or an SAP affiliate company. All rights reserved. 2Public Agenda • Definition of Column Oriented Databases • History of Column Oriented Databases • Working of Column Oriented Databases • Top 3 Market Selling Column Oriented Databases • Advantages of Column Oriented Databases • Disadvantages of Column Oriented Databases • Row vs Column Oriented Databases • Industries to benefit from Column Oriented Databases • Conclusion • Future Work
  • 4. © 2016 SAP SE or an SAP affiliate company. All rights reserved. 4Public • Database management systems that store table data as columns of data rather than as rows of data • Have advantages for data warehousing, customer relationship management (CRM), and library card catalogs, and other ad hoc enquiry where aggregates are computed over large numbers of similar data items • Refer to both the ease of expressing a column oriented structure and the focus on optimizations for column oriented workloads • Common examples include Greenplum Database, Calpont InfiniDB, Accumulo, Teradata, SenSage, EXASOL, MonetDB, RCFile, Sqrrl, etc.
  • 6. © 2016 SAP SE or an SAP affiliate company. All rights reserved. 6Public • 1969 • With focus on information-retrieval in biology, the first application of a column oriented database storage system called TAXIR was developed. • 1976 • Statistics Canada implemented the RAPID system for processing and retrieval of the Canadian Census of Population and Housing as well as several other statistical applications. • 1977 - 1990 • The RAPID system was shared with other statistical organizations throughout the world for widespread use in the 1980s. It was used by Statistics Canada until the 1990s. • 1993 • KDB was launched as the first commercially available column oriented database. • 1995 • Sybase IQ gained prominence as the second column oriented database. • 2005 • The traditional column oriented databases have changed rapidly since 2005 with many open source and commercial implementations.
  • 8. © 2016 SAP SE or an SAP affiliate company. All rights reserved. 8Public EmpId LastName FirstName Salary 10 Smith Joe 40000 12 Jones Mary 50000 11 Johnson Cathy 44000 22 Jones Bob 55000 Ensuring minimum seeks, a column oriented database serializes all values of a column together, then the values of the next column, and so on. For our example table, the data would be stored as below. 10:001,12:002,11:003,22:004;Smith:001,Jones:002,Johnson:003,Jones:004;Joe:001,Mary:002, Cathy:003,Bob:004;40000:001,50000:002,44000:003,55000:004; A column-oriented store "is really just" a row-store with an index on every column.
  • 9. Top 3 Column Oriented Databases
  • 10. © 2016 SAP SE or an SAP affiliate company. All rights reserved. 10Public • Sybase • Aims to deliver high end performance for critical analytics, business intelligence and data warehousing solutions leveraging highly optimized server dedicated for analytics. • Is column oriented with grid-based architecture, patented data compression, and advanced query optimizer, and henceforth, delivers high performance, flexibility, and economy in challenging reporting and analytics environments. • Infobright • Combination of a column oriented database with their Knowledge Grid architecture delivers a self-managed, scalable, high performance analytics query platform. • Industry-leading data compression (10:1 up to 40:1) considerably lowers storage requirements and expensive hardware infrastructures. • Vertica • Purpose-built platform to enable data values having high performance real time analytics needs. • Highly scalable with a seamless integration ecosystem leveraging capabilities like data loading, queries, columnar storage, MPP architecture, and data compression features.
  • 11. Column Oriented Databases – Advantages
  • 12. © 2016 SAP SE or an SAP affiliate company. All rights reserved. 12Public • Scalability and fast data loading for Big Data • Accessible by many third-party BI analytic tools • Simple systems administration • High performance on aggregation queries (like COUNT, SUM, AVG, MIN, MAX) • Highly efficient data compression and/or partitioning
  • 13. Column Oriented Databases – Disadvantages
  • 14. © 2016 SAP SE or an SAP affiliate company. All rights reserved. 14Public • Record updates and deletes reduce storage efficiency • Effective partitioning/indexing schemes can be difficult to design • Transactions are to be avoided or just not supported • Queries with table joins can reduce high performance
  • 15. Row vs Column Oriented Databases
  • 16. © 2016 SAP SE or an SAP affiliate company. All rights reserved. 16Public Comparative Analysis Row Oriented Databases • Relatively less efficient in terms of aggregate computation, multiple updates, multiple select, single insert • Well-suited for OLTP (On-Line Transaction Processing) like workloads Column Oriented Databases • Highly efficient in terms of aggregate computation, multiple updates, multiple select, single insert • Well-suited for OLAP (On-Line Analytical Processing) like workloads
  • 17. Column Oriented Databases – Benefited Industries
  • 18. © 2016 SAP SE or an SAP affiliate company. All rights reserved. 18Public • Telecommunications: Helps in improving response time to the customer by reducing input and output • Financial Services: Supports high performance, millisecond response time to queries required for inbound market • Retail: Reads the data only referenced in question driving higher performance and lowering processing costs compared to reading all the columns in the table
  • 20. © 2016 SAP SE or an SAP affiliate company. All rights reserved. 20Public Column oriented database is a key technology that delivers high business value by helping enterprises adapt their information infrastructure to the evolving demands for timely, reliable intelligence to run the business. In addition, it has far-reaching implications for the design of systems, and offers major cost savings affecting higher power and cooling requirements.
  • 22. © 2016 SAP SE or an SAP affiliate company. All rights reserved. 22Public Columnar databases can be very helpful in big data project. Big data, today is one of the biggest problem ever faced. When we have volume and variety of random real-time data, we might want to use a columnar database to exploit its flexibility, performance and scalability. Till date, HBase is the only column oriented database that is used with big data. I look forward to carry out a comparative analysis of HBase performance and other columnar database performance when fed with big data.
  • 23. © 2016 SAP SE or an SAP affiliate company. All rights reserved. 23Public References 1. Stonebraker et al., “C-Store: A Column-oriented DBMS”, Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005 2. Copeland, George P. and Khoshafian, Setrag N., “A Decomposition Storage Model”, SIGMOD '85, 1985 3. Daniel Abadi and Samuel Madden, "Debunking Another Myth: Column-Stores vs. Vertical Partitioning", The Database Column, 31 July 2008 4. Stavros Harizopoulos, Daniel Abadi and Peter Boncz, "Column-Oriented Database Systems", VLDB 2009 Tutorial, p. 5 5. Pat & Betty O’Neil, Xuedong Chen and Stephen Revilak, “The Star Schema Benchmark and Augmented Fact Table Indexing”, TPC Technology Conference 8/24/09 6. D. J. Abadi, S. R. Madden, N. Hachem, “Column-stores vs. Row-stores: How Different are They Really?”, in: SIGMOD’08, 2008, pp. 967–980. 7. N. Bruno, “Teaching an old elephant new tricks”, in: CIDR ’09, 2009. 8. Daniel Lemire, Owen Kaser, Kamel Aouiche, "Sorting Improves Word-Aligned Bitmap Indexes", Data & Knowledge Engineering, Volume 69, Issue 1 (2010), pp. 3-28. 9. Daniel Lemire and Owen Kaser, “Reordering Columns for Smaller Indexes”, Information Sciences 181 (12), 2011 10. Slezak et al., Brighthouse: “An Analytic Data Warehouse for Ad hoc Queries”, Proceedings of the 34th VLDB Conference, Auckland, New Zealand, 2008 11. Estabrook, Brill, “The Theory of the TAXIR Accessioned”, Mathematical Biosciences, Volume 5, Issues 3–4, Elsevier B.V. 12. Turner, Hammond, Cotton, “A DBMS for Large Statistical Databases”, Proceedings of VLDB 1979, Rio de Janeiro, Brazil
  • 24. © 2016 SAP SE or an SAP affiliate company. All rights reserved. Thank you Contact information: Arundhati Kanungo Developer Associate SAP Labs India Pvt. Ltd. Arundhati.Kanungo@sap.com +91 7406313166
  • 25. © 2016 SAP SE or an SAP affiliate company. All rights reserved. Any Questions???