SlideShare a Scribd company logo
BigData and open-source RDBMSs
Alexander Tokarev
Who am I
1. Cloud architect:
- Solution design
- Best practices
- Performance tuning
2. Enterprise databases Oracle - 2001, Oracle 8.1.7 – 19c
3. Cloud databases: Redshift, Athena, BigQuery, SnowFlake
Agenda
1.Big Data
2.Postgres ecosystem
3.Others vendors
4.Conclusion
5.Q&A
Big Data
1.Velocity
2.Volume
3.Variety
4.




.
5.




.
6.Vxxxxxxxxx
Big Data in RDBMS
1.Velocity
2.Volume
Big Data ingestion nature
1.batch/micro-batch
2.realtime
Big Data nature
1.Time series data: date, numeric metrics
‱ Append-only mostly
‱ Huge volume of inserts
2.Analytical data: date, dimensions, measures
‱ A lot of dimensions
‱ Dimensions are subjects for update
‱ Moderate ingestion speed
Big Data data models
1.Star-style normalized
2.Wide tables
3.Nested structures
Postgres eco-system
1.Postgres
2.Citus
3.Timescale
4.Greenplum
Postgres
1.Classical RDBMS
2.Very good for OLTP
3.Our experience – max DB size 5 TB
4.Partitioning is fine from PG 12
5.Row-based database
6.Search conditions predefined
Postgres improvements types
1.Fork
+ more control
- major version faster
2.Extension
+ in sync with main version
- not all changes are possible
Timescale
1.Postgres extension
2.Time-series optimized database (like Prometheus)
3.Ingestion 10x faster than Postgres
4.Own partitioning implementation
5.Time-series-oriented SQL
6.Attempts to create stream processing with SQL
7.SQL is perfect to extract by dates
MPP databases
+
Very fast
Horizontally scalable
-
Different modeling approach
Like good disks
Hard to fined managed databases
Citus
1.Postgres extension
2.Owned by Microsoft
3.MPP tailored for OLTP – sharding for fast ingestion
4.Vendor states for OLAP as well – not true
5.SQL is perfect to extract data by shard key
Greenplum
1.Postgres fork
2.OLAP workload optimized MPP database
3.Perfect partitioning with automatic data lifecycle management
4.A lot of compression algorithms
5.Up to 100 TB clusters very fast
6.A lot of integrations: files, HDFS
7.Perfect SQL support – perfect for analytical queries
Big data platforms
1.Set of opensource components
2.Components for storage, processing, management
3.Storage components different by tiers:
- in-memory for hot
- MPP for warm
- Hadoop for cold
Big data platform Arenadata DB
Clickhouse
1. DBMS written from scratch
2. Written by developers for developers – Yandex.Metrica
3. Wide table optimized MPP columnar database
4. Extremely fast batch ingestion + InMemory buffer tables
5. Very fast queries for wide tables
6.Aggregations during ingestion + real-time materialized views
7. Hundreds TB clusters
8. Huge number of integrations: files, HDFS, Kafka, JDBC
9. Self-sufficient for ELT/ETL
10. Own SQL dialect
Summary
Postgres Citus Timescale Greenplum Clickhouse
Columnar storage - - - + +
SQL richness ++ + + + -
Integrations - - - + +++
In-flight aggregations - - -+ - +++
Ingestion style Per row Per row Per row Bulk Huge
batches
Ingestion performance + + ++++ ++ +++
Wide table oriented - - + + +++
Star-schema oriented - - - + -
Horizontal scaling - + ++ ++ +++
Managed in cloud +++ + + - -+
More questions?

More Related Content

PPTX
Tagging search solution design
PPTX
Cloud DWH deep dive
PPTX
P9 speed of-light faceted search via oracle in-memory option by alexander tok...
PPTX
Faceted search with Oracle InMemory option
PPTX
Oracle InMemory hardcore edition
PPTX
Cloud dwh
PDF
Making Apache Spark Better with Delta Lake
PDF
Operating and Supporting Delta Lake in Production
Tagging search solution design
Cloud DWH deep dive
P9 speed of-light faceted search via oracle in-memory option by alexander tok...
Faceted search with Oracle InMemory option
Oracle InMemory hardcore edition
Cloud dwh
Making Apache Spark Better with Delta Lake
Operating and Supporting Delta Lake in Production

What's hot (20)

PDF
Optimising Geospatial Queries with Dynamic File Pruning
PDF
Next CERN Accelerator Logging Service with Jakub Wozniak
PPTX
Sql server 2016 it just runs faster sql bits 2017 edition
PDF
Building Operational Data Lake using Spark and SequoiaDB with Yang Peng
PPTX
Inside SQL Server In-Memory OLTP
PPTX
Open Policy Agent for governance as a code
PDF
User Defined Partitioning on PlazmaDB
PDF
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
PPTX
Brk3288 sql server v.next with support on linux, windows and containers was...
PPTX
SQL Server In-Memory OLTP: What Every SQL Professional Should Know
PPTX
Big Data (NJ SQL Server User Group)
PDF
The Pill for Your Migration Hell
PPTX
Unified Batch & Stream Processing with Apache Samza
PDF
Optimizing Delta/Parquet Data Lakes for Apache Spark
PPTX
Solving low latency query over big data with Spark SQL
PPTX
Inside sql server in memory oltp sql sat nyc 2017
PPTX
Gs08 modernize your data platform with sql technologies wash dc
PDF
Delta from a Data Engineer's Perspective
PDF
Common Strategies for Improving Performance on Your Delta Lakehouse
PDF
Fine-Grained Scheduling with Helix (ApacheCon NA 2014)
Optimising Geospatial Queries with Dynamic File Pruning
Next CERN Accelerator Logging Service with Jakub Wozniak
Sql server 2016 it just runs faster sql bits 2017 edition
Building Operational Data Lake using Spark and SequoiaDB with Yang Peng
Inside SQL Server In-Memory OLTP
Open Policy Agent for governance as a code
User Defined Partitioning on PlazmaDB
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Brk3288 sql server v.next with support on linux, windows and containers was...
SQL Server In-Memory OLTP: What Every SQL Professional Should Know
Big Data (NJ SQL Server User Group)
The Pill for Your Migration Hell
Unified Batch & Stream Processing with Apache Samza
Optimizing Delta/Parquet Data Lakes for Apache Spark
Solving low latency query over big data with Spark SQL
Inside sql server in memory oltp sql sat nyc 2017
Gs08 modernize your data platform with sql technologies wash dc
Delta from a Data Engineer's Perspective
Common Strategies for Improving Performance on Your Delta Lakehouse
Fine-Grained Scheduling with Helix (ApacheCon NA 2014)
Ad

Similar to Relational databases for BigData (20)

DOCX
Big Data A La Carte Menu
PDF
Design Choices for Cloud Data Platforms
PDF
Beyond Relational
PPTX
Choosing technologies for a big data solution in the cloud
PPTX
Rising Interest in Open Source Relational Databases
PDF
Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI - Gr...
PPTX
Not only SQL - Database Choices
PDF
Unlocking big data with Hadoop + MySQL
PPTX
Big data presentation
PDF
Sharing Experiences in Cloud Adoption: Burlington, MA
 
PPTX
Column Stores and Google BigQuery
PDF
Gcp data engineer
PDF
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
PPTX
Big Data in 200 km/h | AWS Big Data Demystified #1.3
PPTX
Introduction to big data
PDF
GCP Data Engineer cheatsheet
PDF
Cloud Big Data Architectures
PPTX
big data eco system fundamentals of data science
PDF
PostgreSQL as a Big Data Platform
PDF
Greenplum versus redshift and actian vectorwise comparison
Big Data A La Carte Menu
Design Choices for Cloud Data Platforms
Beyond Relational
Choosing technologies for a big data solution in the cloud
Rising Interest in Open Source Relational Databases
Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI - Gr...
Not only SQL - Database Choices
Unlocking big data with Hadoop + MySQL
Big data presentation
Sharing Experiences in Cloud Adoption: Burlington, MA
 
Column Stores and Google BigQuery
Gcp data engineer
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Introduction to big data
GCP Data Engineer cheatsheet
Cloud Big Data Architectures
big data eco system fundamentals of data science
PostgreSQL as a Big Data Platform
Greenplum versus redshift and actian vectorwise comparison
Ad

More from Alexander Tokarev (17)

PPTX
Rate limits and all about
PPTX
rnd teams.pptx
PPTX
FinOps for private cloud
PPTX
Graph ql and enterprise
PPTX
FinOps introduction
PPTX
Row Level Security in databases advanced edition
PPTX
Row level security in enterprise applications
PPTX
Inmemory BI based on opensource stack
PPTX
Tagging search solution design Advanced edition
PPTX
Oracle JSON treatment evolution - from 12.1 to 18 AOUG-2018
PPTX
Oracle JSON internals advanced edition
PPTX
Oracle Result Cache deep dive
PPTX
Oracle result cache highload 2017
PPTX
Oracle json caveats
PPTX
Apache Solr for begginers
PPTX
Data structures for cloud tag storage
PPT
Oracle High Availabiltity for application developers
Rate limits and all about
rnd teams.pptx
FinOps for private cloud
Graph ql and enterprise
FinOps introduction
Row Level Security in databases advanced edition
Row level security in enterprise applications
Inmemory BI based on opensource stack
Tagging search solution design Advanced edition
Oracle JSON treatment evolution - from 12.1 to 18 AOUG-2018
Oracle JSON internals advanced edition
Oracle Result Cache deep dive
Oracle result cache highload 2017
Oracle json caveats
Apache Solr for begginers
Data structures for cloud tag storage
Oracle High Availabiltity for application developers

Recently uploaded (20)

PDF
Digital Strategies for Manufacturing Companies
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
Transform Your Business with a Software ERP System
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Nekopoi APK 2025 free lastest update
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
history of c programming in notes for students .pptx
PPT
Introduction Database Management System for Course Database
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Digital Strategies for Manufacturing Companies
Wondershare Filmora 15 Crack With Activation Key [2025
CHAPTER 2 - PM Management and IT Context
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
How to Migrate SBCGlobal Email to Yahoo Easily
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Transform Your Business with a Software ERP System
Softaken Excel to vCard Converter Software.pdf
Design an Analysis of Algorithms I-SECS-1021-03
2025 Textile ERP Trends: SAP, Odoo & Oracle
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Nekopoi APK 2025 free lastest update
VVF-Customer-Presentation2025-Ver1.9.pptx
Understanding Forklifts - TECH EHS Solution
history of c programming in notes for students .pptx
Introduction Database Management System for Course Database
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025

Relational databases for BigData

Editor's Notes

  • #3: My name is Alex and I work for top-1 Russian bank where I’m responsible for our cloud. I like Oracle and cloud databases.
  • #21: In Slide Show mode, select the arrows to visit links.