Greenplum Database
Overview

Michael Crutcher

Greenplum Product Management

© Copyright 2012 EMC Corporation. All rights reserved.

1
© Copyright 2012 EMC Corporation. All rights reserved.

2
© Copyright 2012 EMC Corporation. All rights reserved.

3
© Copyright 2012 EMC Corporation. All rights reserved.

4
Greenplum Unified Analytic Platform

© Copyright 2012 EMC Corporation. All rights reserved.

5
GREENPLUM DATABASE

Industry Leading Database with
Massively Parallel Performance
To Empower your Analytics

© Copyright 2012 EMC Corporation. All rights reserved.

6
GREENPLUM DATABASE

Extreme Performance for Analytics
 Optimized for BI and analytics
– Deep integration with statistical packages
– High performance parallel implementations

• Simple and automatic
– Just load and query like any database
– Tables are automatically distributed
across nodes

• Extremely scalable
– MPP shared-nothing architecture
– All nodes can scan and process in parallel
– Linear scalability by adding nodes

© Copyright 2012 EMC Corporation. All rights reserved.

7
GREENPLUM DATABASE

Performance Through Parallelism
Master
Servers

...

...

Query planning &
dispatch

Network
Interconnect

Segment
Servers

...

...

Query processing
& data storage

External
Sources
Loading,
streaming, etc.

© Copyright 2012 EMC Corporation. All rights reserved.

8
GREENPLUM DATABASE

Greenplum Delivers Choice & Flexibility
Greenplum Data
Computing Appliance

Greenplum
Software Solutions

Choose Greenplum
Database and/or
Hadoop modules in
¼ rack increments

 Greenplum
Database, Hadoop,
& Chorus on your
x86 hardware

Scale up by adding
your choice of
additional modules

 Flexibility for any
workload or
environment

Minimal time to value

 Perpetual or
subscription licenses

© Copyright 2012 EMC Corporation. All rights reserved.

9
Core Functionality
GREENPLUM DATABASE

© Copyright 2012 EMC Corporation. All rights reserved.

10
GREENPLUM DATABASE

Component Overview
CLIENT ACCESS

CLIENT ACCESS
& TOOLS

3rd PARTY TOOLS

ADMIN TOOLS

ODBC, JDBC, OLEDB,

BI Tools, ETL Tools

Greenplum Command Center

MapReduce, etc.

Data Mining, etc

Greenplum Package Manager

LOADING & EXT. ACCESS

LANGUAGE SUPPORT

Petabyte-Scale Loading

PRODUCT
FEATURES

STORAGE & DATA ACCESS
Hybrid Storage & Execution
(Row- & Column-Oriented)

Comprehensive SQL

Trickle Micro-Batching

In-Database Compression

Anywhere Data Access

Native MapReduce
SQL 2003 OLAP Extensions

Multi-Level Partitioning
Indexes – Btree, Bitmap, etc.

Programmable Analytics

External Table Support

GREENPLUM
DATABASE ADAPTIVE
SERVICES

CORE MPP
ARCHITECTURE

Multi-Level Fault Tolerance
(RAID, Mirroring, DR with
Data Domain Boost)

Analytics Extensions
(GeoSpatial, PR/R, PL/Java,
PL/Python, PL/Perl)

Online System Expansion

Workload Management

Shared-Nothing MPP

Parallel Dataflow Engine

Parallel Query Optimizer

gNet™ Software Interconnect

Polymorphic Data Storage™

Scatter/Gather Streaming™ Data Loading

© Copyright 2012 EMC Corporation. All rights reserved.

11
GREENPLUM DATABASE

Most Powerful Data Loading Capabilities
 Industry leading performance
at 10+TB per-hour per-rack

SINGLE RACK COMPARISON

 Scatter-Gather Streaming™
provides true linear scaling
 Support for both large-batch and
continuous real-time loading
strategies
 Enable complex data
transformations ―in-flight‖

 Transparent interfaces to loading
via support files, application, and
services

© Copyright 2012 EMC Corporation. All rights reserved.

Greenplum

Oracle
Exadata

Netezza

Teradata

Greenplum load rates scale linearly with
the number of racks, others do not.
For example, two racks = >20TB/H

12
GREENPLUM DATABASE

Polymorphic Table StorageTM
TABLE ‗CUSTOMER‘
Mar
‗11

Apr
‗11

May
‗11

Jun
‗11

Jul
‗11

Aug
‗11

Column-oriented for COLD DATA

Sept
‗11

Oct
‗11

Nov
‗11

Row-oriented for HOT DATA

• Storage types can be mixed within a table or database
– Four table types: heap, row-oriented AO, column-oriented AO,
external

• Rich compression functionality, definable column by column
– Block compression: Gzip (levels 1-9), QuickLZ
– Stream compression: RLE (levels 1-4)

• Flexible indexing, partitioning, and more
© Copyright 2012 EMC Corporation. All rights reserved.

13
GREENPLUM DATABASE

gNet Software Interconnect
 A supercomputing-based ―soft-switch‖
responsible for
– Efficiently pumping streams of data between motion
nodes during query-plan execution

– Delivers messages, moves data, collects results, and
coordinates work among the segments in the system
gNet Software
Interconnect

© Copyright 2012 EMC Corporation. All rights reserved.

14
GREENPLUM DATABASE

Parallel Query Optimizer
PHYSICAL EXECUTION PLAN
FROM SQL OR MAPREDUCE

 Cost-based optimization
looks for the most
efficient plan

Gather Motion
4:1(Slice 3)
Sort

 Physical plan contains
scans, joins, sorts,
aggregations, etc.
 Global planning avoids
sub-optimal ‘SQL
pushing’ to segments
 Directly inserts ‘motion’
nodes for inter-segment
communication

© Copyright 2012 EMC Corporation. All rights reserved.

HashAggregate

HashJoin

Redistribute Motion
4:4(Slice 1)

Hash

HashJoin

HashJoin

Seq Scan on
lineitem

Hash
Seq Scan on
orders

Seq Scan on
customer

Hash
Broadcast Motion
4:4(Slice 2)
Seq Scan on
motion

15
Analytics Overview
GREENPLUM DATABASE

© Copyright 2012 EMC Corporation. All rights reserved.

16
GREENPLUM DATABASE

Analytical Capabilities Overview
Data Access & Query Layer

ODBC

JDBC

SQL
Stored
Procedures

SQL 2003
OLAP

MapReduce

In-Database
Analytics

Polymorphic Storage

GREENPLUM
HD

GREENPLUM DATABASE

Greenplum gNet

© Copyright 2012 EMC Corporation. All rights reserved.

17
GREENPLUM DATABASE

In-Database Analytics: Categories
Data Access & Query Layer

ODBC

JDBC

SQL
In-Database Analytics
Embedded
Partner
Open-Source

GPDB
Embedded
Analytics

SAS Scoring
Accelerator
SAS/HPA
High Performance
Analytics

Open Source
Extensions

User-Written
Analytical
Algorithms

User-written

GREENPLUM DATABASE

© Copyright 2012 EMC Corporation. All rights reserved.

18
GREENPLUM DATABASE

Analytics Highlight: MADlib
 Scalable in-database
analytics
 Data-parallel
–
–
–
–

Mathematical Algorithms
Statistical Algorithms
Machine learning Algorithms
Supports structured and
unstructured data.

 Open-source software
– Source Accessibility
– Converge business,
academic, and open-source
communities

© Copyright 2012 EMC Corporation. All rights reserved.

19
Manageability, Extensions
GREENPLUM DATABASE

© Copyright 2012 EMC Corporation. All rights reserved.

20
GREENPLUM DATABASE

Easy Manageability for Big Data
 Single console for both Database and Hadoop
 Administration

– Start, Stop Database
– Recover, Rebalance Segments

 Interactive view of System Metrics

– Real-time
– Historic (Configurable by time period)

 In-depth view for System Health
– Hardware health
– Software (Database, Hadoop)

 Query Monitoring

– Search, Prioritize, Cancel Queries
– View Query‘s Execution Plan

 Workload Management

– Configure Resource Queues
– Prioritize Users

© Copyright 2012 EMC Corporation. All rights reserved.

21
GREENPLUM DATABASE

Easy Extension Installation
Greenplum Package Manager
Greenplum supports easy deployment
of numerous extensions like Madlib,
PL/Perl, PL/Java, PostGIS, etc.

Master
Servers

Segment
Servers

...

© Copyright 2012 EMC Corporation. All rights reserved.

...

22
GREENPLUM DATABASE

High Performance gNet for Hadoop
Parallel Query Access
 Connect any data set in Hadoop to
GP DB‘s SQL Engine
 Process Hadoop data in place
 Parallelize import/export data
from/to Hadoop thanks to GP DB‘s
market leading data sharing
performance

gNet for Hadoop

Text

Binary

UserDefined

 Supported formats:
– Text (compressed and
uncompressed)
– binary
– proprietary/user-defined

 GP HD 1.x, GP MR 1.x, CDH3u2

© Copyright 2012 EMC Corporation. All rights reserved.

23
High Availability,
Back up, Support
GREENPLUM DATABASE

© Copyright 2012 EMC Corporation. All rights reserved.

24
GREENPLUM DATABASE

High Availability
 GPDB cluster
– 2 Master servers
– Multiple Segment servers

 Segment servers support
multiple database
instances
– Primary instances that
actively process queries
– Standby mirror instances

 Block level mirroring
– Low resource
consumption
– Differential resynch
capable for fast recovery

© Copyright 2012 EMC Corporation. All rights reserved.

Set of Active
Segment Instances

25
GREENPLUM DATABASE

Backup/Restore with EMC Data Domain
 Integration options
Full
Appliance
+
Data Domain

Boost or NFS

2 X 10GBit IP

– NFS: Data Domain device mounted
as NFS storage
– DD Boost: Native, client-side
deduplication. Supported in GPDB
4.2 and higher

 Drastic reduction in backup storage
requirement
 Backup all segment servers in
parallel directly to Data Domain

 Data Domain Integrates seamlessly
into standard Greenplum full
backup data export and data
restore procedures

© Copyright 2012 EMC Corporation. All rights reserved.

26
GREENPLUM DATABASE

Backup/Restore with EMC Data Domain
Backup and restore between remote and primary sites
Greenplum DCA

Greenplum DCA

Data Domain

Data Domain
LAN/WAN

Data Domain
Replication
 Ideal for configurations with RPO and RTO requirements that can be specified in hours
 Supports:
– Collection Replication for DD Boost backup
– Directory-level replication for NFS backup
– Encryption over the WAN

© Copyright 2012 EMC Corporation. All rights reserved.

27
GREENPLUM DATABASE

Customer Support Services
• Remote Technical Support
–

24x7 technical support and remote troubleshooting

–

Customer-managed case severity level

–

Four-hour response objective

• Onsite Support (DCA Only)
–

Installation of replacement parts

–

Replacement parts shipped for next business day arrival

–

GP SW upgrade included

• Proactive Service
–

Secure remote monitoring for hardware (DCA)

–

Notification of engineering technical advisories

–

Built-in tools maximize stability and performance

• Secure Self-Help
–

© Copyright 2012 EMC Corporation. All rights reserved.

24x7 access to eService support tools including
knowledgebase, forums, and appropriately licensed
software updates

28
GREENPLUM DATABASE

Other Relevant Greenplum Sessions
Session

Presenter

Times

Unified Analytics Platform Introduction

Brian Wilson

Tues 10:00-11:00

Thurs 1:00-2:00

Greenplum Hadoop Overview

Susheel Kaushik

Mon 10:00-11:00

Wed 4:15-5:15

Greenplum DCA Overview

Hanxi Chen

Mon 4:00-5:00

Thurs 10:00-11:00

Greenplum Analytics Workbench

Apurva Desai

Wed 8:30-9:30

Thurs 10:00-11:00

Analytics on Hadoop

Don Miner

Tues 11:30-12:30

Thurs 8:30-9:30

Big Data Driven Businesses in Action:
Creating Real Business Value Using
Greenplum UAP (Panel w/4 Customers)

Mike Maxey

Wed 4:15-5:15

Thurs 11:30-12:30

Analytics for Business Value: Collaboration

Josh Klahr

Mon 10:00-11:00

Wed 2:45-3:45

Disruptive Data Science — How Data
Science and Big Data are Transforming
Business, IT and People

Annika Jimenez
David Dietrich

Tues 4:15-5:15

Thurs 11:30-12:30

© Copyright 2012 EMC Corporation. All rights reserved.

29
Thank You

© Copyright 2012 EMC Corporation. All rights reserved.

30
Greenplum feature

More Related Content

PPTX
Greenplum Database Open Source December 2015
PDF
Greenplum Database on HDFS
PPTX
Hadoop & Greenplum: Why Do Such a Thing?
PDF
Introduction to Greenplum
PDF
Whitepaper : Working with Greenplum Database using Toad for Data Analysts
 
PDF
Greenplum: Driving the future of Data Warehousing and Analytics
PDF
Greenplum Architecture
PDF
White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian...
 
Greenplum Database Open Source December 2015
Greenplum Database on HDFS
Hadoop & Greenplum: Why Do Such a Thing?
Introduction to Greenplum
Whitepaper : Working with Greenplum Database using Toad for Data Analysts
 
Greenplum: Driving the future of Data Warehousing and Analytics
Greenplum Architecture
White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian...
 

What's hot (20)

PPTX
An overview of reference architectures for Postgres
 
PDF
Greenplum Roadmap
PPTX
An overview of reference architectures for Postgres
 
PPTX
Demonstrating the Future of Data Science
PDF
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
PDF
Greenplum hadoop
PPTX
Public Sector Virtual Town Hall: High Availability for PostgreSQL
 
PPTX
Overcoming write availability challenges of PostgreSQL
 
PPTX
New and Improved Features in PostgreSQL 13
 
PDF
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
PPTX
OLTP+OLAP=HTAP
 
PDF
Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...
 
PPTX
Automating a PostgreSQL High Availability Architecture with Ansible
 
PPTX
The columnar roadmap: Apache Parquet and Apache Arrow
PPTX
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
PDF
PostgreSQL 13 is Coming - Find Out What's New!
 
PPTX
Beginners Guide to High Availability for Postgres
 
PDF
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
 
PPTX
How to use postgresql.conf to configure and tune the PostgreSQL server
 
PPTX
Apache Hadoop YARN 3.x in Alibaba
An overview of reference architectures for Postgres
 
Greenplum Roadmap
An overview of reference architectures for Postgres
 
Demonstrating the Future of Data Science
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Greenplum hadoop
Public Sector Virtual Town Hall: High Availability for PostgreSQL
 
Overcoming write availability challenges of PostgreSQL
 
New and Improved Features in PostgreSQL 13
 
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
OLTP+OLAP=HTAP
 
Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...
 
Automating a PostgreSQL High Availability Architecture with Ansible
 
The columnar roadmap: Apache Parquet and Apache Arrow
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
PostgreSQL 13 is Coming - Find Out What's New!
 
Beginners Guide to High Availability for Postgres
 
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
 
How to use postgresql.conf to configure and tune the PostgreSQL server
 
Apache Hadoop YARN 3.x in Alibaba
Ad

Similar to Greenplum feature (20)

PPTX
Green Plum IIIT- Allahabad
PDF
EMC Greenplum Database version 4.2
 
PDF
EMC Unified Analytics Platform. Gintaras Pelenis
PDF
Greenplum Database Overview
 
PPT
Gp Introduction 200811
PDF
White Paper: MoreVRP for EMC Greenplum
 
PDF
White Paper: EMC Greenplum Data Computing Appliance Enhances EMC IT's Global ...
 
PDF
Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI - Gr...
PDF
Greenplum versus redshift and actian vectorwise comparison
PDF
2012 10 24_briefing room
PDF
PostgreSQL and MySQL
PDF
Brochure : The EMC Big Data Solution
 
PDF
NYC Meetup November 15, 2012
PDF
Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
PDF
EnterpriseDB Postgres Survey Results - 2013
 
PPTX
Modernizing Mission-Critical Apps with SQL Server
PPTX
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
PDF
Daum Communications Case Study
PPTX
The Most Trusted In-Memory database in the world- Altibase
PDF
Learn How Dell Improved Postgres/Greenplum Performance 20x with a Database Pr...
Green Plum IIIT- Allahabad
EMC Greenplum Database version 4.2
 
EMC Unified Analytics Platform. Gintaras Pelenis
Greenplum Database Overview
 
Gp Introduction 200811
White Paper: MoreVRP for EMC Greenplum
 
White Paper: EMC Greenplum Data Computing Appliance Enhances EMC IT's Global ...
 
Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI - Gr...
Greenplum versus redshift and actian vectorwise comparison
2012 10 24_briefing room
PostgreSQL and MySQL
Brochure : The EMC Big Data Solution
 
NYC Meetup November 15, 2012
Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
EnterpriseDB Postgres Survey Results - 2013
 
Modernizing Mission-Critical Apps with SQL Server
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Daum Communications Case Study
The Most Trusted In-Memory database in the world- Altibase
Learn How Dell Improved Postgres/Greenplum Performance 20x with a Database Pr...
Ad

Recently uploaded (20)

PPTX
Microsoft Excel 365/2024 Beginner's training
DOCX
search engine optimization ppt fir known well about this
PPT
What is a Computer? Input Devices /output devices
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
Architecture types and enterprise applications.pdf
PPTX
Configure Apache Mutual Authentication
PPTX
Chapter 5: Probability Theory and Statistics
PDF
STKI Israel Market Study 2025 version august
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PPTX
Modernising the Digital Integration Hub
PDF
Credit Without Borders: AI and Financial Inclusion in Bangladesh
PDF
CloudStack 4.21: First Look Webinar slides
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
A review of recent deep learning applications in wood surface defect identifi...
Microsoft Excel 365/2024 Beginner's training
search engine optimization ppt fir known well about this
What is a Computer? Input Devices /output devices
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
Architecture types and enterprise applications.pdf
Configure Apache Mutual Authentication
Chapter 5: Probability Theory and Statistics
STKI Israel Market Study 2025 version august
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
1 - Historical Antecedents, Social Consideration.pdf
OpenACC and Open Hackathons Monthly Highlights July 2025
Consumable AI The What, Why & How for Small Teams.pdf
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
The influence of sentiment analysis in enhancing early warning system model f...
Modernising the Digital Integration Hub
Credit Without Borders: AI and Financial Inclusion in Bangladesh
CloudStack 4.21: First Look Webinar slides
Developing a website for English-speaking practice to English as a foreign la...
A review of recent deep learning applications in wood surface defect identifi...

Greenplum feature

  • 1. Greenplum Database Overview Michael Crutcher Greenplum Product Management © Copyright 2012 EMC Corporation. All rights reserved. 1
  • 2. © Copyright 2012 EMC Corporation. All rights reserved. 2
  • 3. © Copyright 2012 EMC Corporation. All rights reserved. 3
  • 4. © Copyright 2012 EMC Corporation. All rights reserved. 4
  • 5. Greenplum Unified Analytic Platform © Copyright 2012 EMC Corporation. All rights reserved. 5
  • 6. GREENPLUM DATABASE Industry Leading Database with Massively Parallel Performance To Empower your Analytics © Copyright 2012 EMC Corporation. All rights reserved. 6
  • 7. GREENPLUM DATABASE Extreme Performance for Analytics  Optimized for BI and analytics – Deep integration with statistical packages – High performance parallel implementations • Simple and automatic – Just load and query like any database – Tables are automatically distributed across nodes • Extremely scalable – MPP shared-nothing architecture – All nodes can scan and process in parallel – Linear scalability by adding nodes © Copyright 2012 EMC Corporation. All rights reserved. 7
  • 8. GREENPLUM DATABASE Performance Through Parallelism Master Servers ... ... Query planning & dispatch Network Interconnect Segment Servers ... ... Query processing & data storage External Sources Loading, streaming, etc. © Copyright 2012 EMC Corporation. All rights reserved. 8
  • 9. GREENPLUM DATABASE Greenplum Delivers Choice & Flexibility Greenplum Data Computing Appliance Greenplum Software Solutions Choose Greenplum Database and/or Hadoop modules in ¼ rack increments  Greenplum Database, Hadoop, & Chorus on your x86 hardware Scale up by adding your choice of additional modules  Flexibility for any workload or environment Minimal time to value  Perpetual or subscription licenses © Copyright 2012 EMC Corporation. All rights reserved. 9
  • 10. Core Functionality GREENPLUM DATABASE © Copyright 2012 EMC Corporation. All rights reserved. 10
  • 11. GREENPLUM DATABASE Component Overview CLIENT ACCESS CLIENT ACCESS & TOOLS 3rd PARTY TOOLS ADMIN TOOLS ODBC, JDBC, OLEDB, BI Tools, ETL Tools Greenplum Command Center MapReduce, etc. Data Mining, etc Greenplum Package Manager LOADING & EXT. ACCESS LANGUAGE SUPPORT Petabyte-Scale Loading PRODUCT FEATURES STORAGE & DATA ACCESS Hybrid Storage & Execution (Row- & Column-Oriented) Comprehensive SQL Trickle Micro-Batching In-Database Compression Anywhere Data Access Native MapReduce SQL 2003 OLAP Extensions Multi-Level Partitioning Indexes – Btree, Bitmap, etc. Programmable Analytics External Table Support GREENPLUM DATABASE ADAPTIVE SERVICES CORE MPP ARCHITECTURE Multi-Level Fault Tolerance (RAID, Mirroring, DR with Data Domain Boost) Analytics Extensions (GeoSpatial, PR/R, PL/Java, PL/Python, PL/Perl) Online System Expansion Workload Management Shared-Nothing MPP Parallel Dataflow Engine Parallel Query Optimizer gNet™ Software Interconnect Polymorphic Data Storage™ Scatter/Gather Streaming™ Data Loading © Copyright 2012 EMC Corporation. All rights reserved. 11
  • 12. GREENPLUM DATABASE Most Powerful Data Loading Capabilities  Industry leading performance at 10+TB per-hour per-rack SINGLE RACK COMPARISON  Scatter-Gather Streaming™ provides true linear scaling  Support for both large-batch and continuous real-time loading strategies  Enable complex data transformations ―in-flight‖  Transparent interfaces to loading via support files, application, and services © Copyright 2012 EMC Corporation. All rights reserved. Greenplum Oracle Exadata Netezza Teradata Greenplum load rates scale linearly with the number of racks, others do not. For example, two racks = >20TB/H 12
  • 13. GREENPLUM DATABASE Polymorphic Table StorageTM TABLE ‗CUSTOMER‘ Mar ‗11 Apr ‗11 May ‗11 Jun ‗11 Jul ‗11 Aug ‗11 Column-oriented for COLD DATA Sept ‗11 Oct ‗11 Nov ‗11 Row-oriented for HOT DATA • Storage types can be mixed within a table or database – Four table types: heap, row-oriented AO, column-oriented AO, external • Rich compression functionality, definable column by column – Block compression: Gzip (levels 1-9), QuickLZ – Stream compression: RLE (levels 1-4) • Flexible indexing, partitioning, and more © Copyright 2012 EMC Corporation. All rights reserved. 13
  • 14. GREENPLUM DATABASE gNet Software Interconnect  A supercomputing-based ―soft-switch‖ responsible for – Efficiently pumping streams of data between motion nodes during query-plan execution – Delivers messages, moves data, collects results, and coordinates work among the segments in the system gNet Software Interconnect © Copyright 2012 EMC Corporation. All rights reserved. 14
  • 15. GREENPLUM DATABASE Parallel Query Optimizer PHYSICAL EXECUTION PLAN FROM SQL OR MAPREDUCE  Cost-based optimization looks for the most efficient plan Gather Motion 4:1(Slice 3) Sort  Physical plan contains scans, joins, sorts, aggregations, etc.  Global planning avoids sub-optimal ‘SQL pushing’ to segments  Directly inserts ‘motion’ nodes for inter-segment communication © Copyright 2012 EMC Corporation. All rights reserved. HashAggregate HashJoin Redistribute Motion 4:4(Slice 1) Hash HashJoin HashJoin Seq Scan on lineitem Hash Seq Scan on orders Seq Scan on customer Hash Broadcast Motion 4:4(Slice 2) Seq Scan on motion 15
  • 16. Analytics Overview GREENPLUM DATABASE © Copyright 2012 EMC Corporation. All rights reserved. 16
  • 17. GREENPLUM DATABASE Analytical Capabilities Overview Data Access & Query Layer ODBC JDBC SQL Stored Procedures SQL 2003 OLAP MapReduce In-Database Analytics Polymorphic Storage GREENPLUM HD GREENPLUM DATABASE Greenplum gNet © Copyright 2012 EMC Corporation. All rights reserved. 17
  • 18. GREENPLUM DATABASE In-Database Analytics: Categories Data Access & Query Layer ODBC JDBC SQL In-Database Analytics Embedded Partner Open-Source GPDB Embedded Analytics SAS Scoring Accelerator SAS/HPA High Performance Analytics Open Source Extensions User-Written Analytical Algorithms User-written GREENPLUM DATABASE © Copyright 2012 EMC Corporation. All rights reserved. 18
  • 19. GREENPLUM DATABASE Analytics Highlight: MADlib  Scalable in-database analytics  Data-parallel – – – – Mathematical Algorithms Statistical Algorithms Machine learning Algorithms Supports structured and unstructured data.  Open-source software – Source Accessibility – Converge business, academic, and open-source communities © Copyright 2012 EMC Corporation. All rights reserved. 19
  • 20. Manageability, Extensions GREENPLUM DATABASE © Copyright 2012 EMC Corporation. All rights reserved. 20
  • 21. GREENPLUM DATABASE Easy Manageability for Big Data  Single console for both Database and Hadoop  Administration – Start, Stop Database – Recover, Rebalance Segments  Interactive view of System Metrics – Real-time – Historic (Configurable by time period)  In-depth view for System Health – Hardware health – Software (Database, Hadoop)  Query Monitoring – Search, Prioritize, Cancel Queries – View Query‘s Execution Plan  Workload Management – Configure Resource Queues – Prioritize Users © Copyright 2012 EMC Corporation. All rights reserved. 21
  • 22. GREENPLUM DATABASE Easy Extension Installation Greenplum Package Manager Greenplum supports easy deployment of numerous extensions like Madlib, PL/Perl, PL/Java, PostGIS, etc. Master Servers Segment Servers ... © Copyright 2012 EMC Corporation. All rights reserved. ... 22
  • 23. GREENPLUM DATABASE High Performance gNet for Hadoop Parallel Query Access  Connect any data set in Hadoop to GP DB‘s SQL Engine  Process Hadoop data in place  Parallelize import/export data from/to Hadoop thanks to GP DB‘s market leading data sharing performance gNet for Hadoop Text Binary UserDefined  Supported formats: – Text (compressed and uncompressed) – binary – proprietary/user-defined  GP HD 1.x, GP MR 1.x, CDH3u2 © Copyright 2012 EMC Corporation. All rights reserved. 23
  • 24. High Availability, Back up, Support GREENPLUM DATABASE © Copyright 2012 EMC Corporation. All rights reserved. 24
  • 25. GREENPLUM DATABASE High Availability  GPDB cluster – 2 Master servers – Multiple Segment servers  Segment servers support multiple database instances – Primary instances that actively process queries – Standby mirror instances  Block level mirroring – Low resource consumption – Differential resynch capable for fast recovery © Copyright 2012 EMC Corporation. All rights reserved. Set of Active Segment Instances 25
  • 26. GREENPLUM DATABASE Backup/Restore with EMC Data Domain  Integration options Full Appliance + Data Domain Boost or NFS 2 X 10GBit IP – NFS: Data Domain device mounted as NFS storage – DD Boost: Native, client-side deduplication. Supported in GPDB 4.2 and higher  Drastic reduction in backup storage requirement  Backup all segment servers in parallel directly to Data Domain  Data Domain Integrates seamlessly into standard Greenplum full backup data export and data restore procedures © Copyright 2012 EMC Corporation. All rights reserved. 26
  • 27. GREENPLUM DATABASE Backup/Restore with EMC Data Domain Backup and restore between remote and primary sites Greenplum DCA Greenplum DCA Data Domain Data Domain LAN/WAN Data Domain Replication  Ideal for configurations with RPO and RTO requirements that can be specified in hours  Supports: – Collection Replication for DD Boost backup – Directory-level replication for NFS backup – Encryption over the WAN © Copyright 2012 EMC Corporation. All rights reserved. 27
  • 28. GREENPLUM DATABASE Customer Support Services • Remote Technical Support – 24x7 technical support and remote troubleshooting – Customer-managed case severity level – Four-hour response objective • Onsite Support (DCA Only) – Installation of replacement parts – Replacement parts shipped for next business day arrival – GP SW upgrade included • Proactive Service – Secure remote monitoring for hardware (DCA) – Notification of engineering technical advisories – Built-in tools maximize stability and performance • Secure Self-Help – © Copyright 2012 EMC Corporation. All rights reserved. 24x7 access to eService support tools including knowledgebase, forums, and appropriately licensed software updates 28
  • 29. GREENPLUM DATABASE Other Relevant Greenplum Sessions Session Presenter Times Unified Analytics Platform Introduction Brian Wilson Tues 10:00-11:00 Thurs 1:00-2:00 Greenplum Hadoop Overview Susheel Kaushik Mon 10:00-11:00 Wed 4:15-5:15 Greenplum DCA Overview Hanxi Chen Mon 4:00-5:00 Thurs 10:00-11:00 Greenplum Analytics Workbench Apurva Desai Wed 8:30-9:30 Thurs 10:00-11:00 Analytics on Hadoop Don Miner Tues 11:30-12:30 Thurs 8:30-9:30 Big Data Driven Businesses in Action: Creating Real Business Value Using Greenplum UAP (Panel w/4 Customers) Mike Maxey Wed 4:15-5:15 Thurs 11:30-12:30 Analytics for Business Value: Collaboration Josh Klahr Mon 10:00-11:00 Wed 2:45-3:45 Disruptive Data Science — How Data Science and Big Data are Transforming Business, IT and People Annika Jimenez David Dietrich Tues 4:15-5:15 Thurs 11:30-12:30 © Copyright 2012 EMC Corporation. All rights reserved. 29
  • 30. Thank You © Copyright 2012 EMC Corporation. All rights reserved. 30