Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenplum Summit 2019
© Copyright 2019 Pivotal Software, Inc. All rights Reserved.
Shivram Mani Francisco Guerrero
@shivram @frankgh
Maximize Greenplum
For Any Use Cases
Decoupling Compute and Storage
Cover w/ Image
Agenda
■ Enterprise Data Landscape
■ Accessing External Data from
Greenplum
■ Platform Extension Framework (PXF)
■ Use Cases
■ Q+A
© Copyright 2019 Pivotal Software, Inc. All rights Reserved.© Copyright 2019 Pivotal Software, Inc. All rights Reserved.
Enterprise Data Landscape
The Wild Wild West of Data
?
5
Greenplum uses PXF
as a federated query engine
to access
external heterogeneous data.
Platform Extension Framework (PXF)
Tabular view for
heterogeneous data
Built-in connectors for
various data sources/formats Pluggable framework
Parallel high throughput
data access
Open source
Read and write
external data
7
Architecture of PXF
Master Host
External Data
Segment Host
1
seg1 seg2 seg3
PXF
Segment Host
2
seg4 seg5 seg6
PXF
8
Q: How can I access sales data residing in an S3 bucket stored in parquet format?
Greenplum External Table
CREATE EXTERNAL TABLE sales
(cust int, sku text, amount decimal, date date)
LOCATION
('pxf://s3-bucket/2018/sales/?PROFILE=s3:parquet&SERVER=s3_sales')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import')
profilepath to data server
9
How can we scale performance
when querying remote data ?
Performance - Predicate Pushdown
state=NY
state=NJ
state=CA
state=CA
{state='CA'}
SELECT item, amount FROM orders
WHERE state = 'CA'MASTER
SEGMENT
predicates :
state=CA
PXF with
JDBC
Row oriented
storage format
● Predicate information
pushed to external
system
● External engines can
support predicates for
its own queries (e.g.
JDBC)
● No filtering within PXF
itself
● Partition pruning (e.g.
Hive)
Performance - Column Projection
date:
{item:,
amount:,
state='CA'}
SELECT item, amount FROM orders
WHERE state = 'CA'MASTER
SEGMENT
columns : item, amount
predicates : state=CA
aggregates : count
PXF with
Hive/ORC
Columnar
storage format
● Propagate columns
projection metadata to
external systems
● JDBC, Parquet & ORC
● Reduces Network I/O
● Reduces Remote Disk
I/O
● Improved performance
for aggregate queries
state:
amount:
item:
© Copyright 2019 Pivotal Software, Inc. All rights Reserved.© Copyright 2019 Pivotal Software, Inc. All rights Reserved.
Use Cases
Use Case: Multi-temperature data querying
● Storage based on
operational requirements
● Can I work with data
created few second ago ?
● Can I run a report on data
from few days ago ?
● Can I inspect the data
archived months or years
ago ?
In-Memory
Database
RDBMS
dataData Lake
HOT
DATA
WARM
DATA
COLD
DATA
14
Use Case: Elastic scaling with Greenplum
● Greenplum on K8s for
elastic compute
● Elastic storage with
S3/Azure/Google
● Ability to separate
compute from storage
● On-demand data
warehouses
15
Use Case: Access Heterogenous data on multiple
clouds
● Different cloud providers
based on business
requirements
● Low cost storage
● No storage admin
● Data doesn’t need to be
copied
16
Use Case: Access Heterogenous data on multiple
clouds
Historical_Orders
xx xx
xx xx
Historical_Invoices
xx xx
xx xx
Product_Catalog
xx xx
xx xx
Historical_Orders
xx xx
xx xx
Admin migrates data from s3-
bucket-orders to Azure Blob
Storage
SELECT * FROM historical_orders o, product_catalog p
WHERE o.product_id = p.product_id
s3-bucket-orders s3-bucket-price
Historical_Invoices
xx xx
xx xx
17
SELECT * FROM historical_orders o, product_catalog p
WHERE o.product_id = p.product_id
Use Case: Access Heterogenous data on multiple
cloud
CREATE EXTERNAL TABLE historical_orders
(item int, amount money)
LOCATION
('pxf://s3-bucket-orders/path?PROFILE=s3:parquet&SERVER=s3_orders')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
CREATE EXTERNAL TABLE historical_orders
(item int, amount money)
LOCATION
('pxf://my.azuredatalakestore.net/path?PROFILE=adl:parquet&SERVER=azure')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
Historical_orders table data on S3
Historical_orders table data now on Azure Data Lake
18
Summary
Greenplum embraces the modern data landscape
● Scale and manage compute independently from storage
● Federate queries across heterogeneous data sources
● Cloud Agnostic
Data is available for analytics with Greenplum no matter its form and where it
resides!
19
#ScaleMatters
© Copyright 2019 Pivotal Software, Inc. All rights Reserved.
Cover w/ Image
Greenplum External
Table
Define an external table with the following:
● the schema of the external data
● the protocol pxf
● the location of the data in an external
system
● the profile to identify the specific connector
● The compressions_codec of the data
● the format of the external data
CREATE [READABLE|WRITABLE] EXTERNAL TABLE
table_name
( col_name data_type [,...] | LIKE other_table )
LOCATION ('pxf://<path to data>?
PROFILE=[<profile_name>|<data_store:data_type>]&
COMPRESSIONG_CODEC=[snappy|gzip|lzo|bzip2]&
[&<CUSTOM_OPTIONS>=<value>[...]]’)
FORMAT '[TEXT|CSV|CUSTOM]'
cust, sku, amount, date
1234, ABC, $9.90, 4/01
1235, CDE, $8.80, 3/30
CREATE EXTERNAL TABLE sales
(cust int, sku text, amount decimal, date date)
LOCATION
('pxf:///2018/sales.csv?PROFILE=hdfs:text')
FORMAT 'TEXT'
Cover w/ Image
PXF supports accessing multiple external datastores
simultaneously
● server identifies an external datastore
● Staging directory server/ under
${PXF_CONF}
● Contains relevant configuration files under
servers/{server_name}/
○ HDFS: core-site.xml, hdfs-site.xml, ...
○ S3: s3-site.xml containing access
properties
PXF Multi Server
CREATE [READABLE|WRITABLE] EXTERNAL TABLE
table_name
( col_name data_type [,...] | LIKE other_table )
LOCATION ('pxf://<path to data>?
PROFILE=<data_store:data_type>&
SERVER=<server_name>’)
CREATE EXTERNAL TABLE sales
(cust int, sku text, amount decimal, date date)
LOCATION ('pxf://s3-bucket-
sales/2018/sales.csv?PROFILE=s3:text&server=s3_s
ales’)
FORMAT 'TEXT'
cust, sku, amount, date
1234, ABC, $9.90, 4/01
1235, CDE, $8.80, 3/30
Performance in PXF
● Parallel access to data
● Predicate pushdown
● Column projection
23
SELECT item, amount
WHERE state = 'CA'
column projection
predicate pushdown
Performance in PXF
● Parallel access to
data
● Column Projection
● Predicate Pushdown
24
SELECT item, amount
WHERE state = 'CA'
column projection
predicate pushdown

More Related Content

PPTX
Zero to Snowflake Presentation
PDF
Snowflake for Data Engineering
PPTX
Snowflake essentials
PDF
Continuous Data Replication into Cloud Storage with Oracle GoldenGate
PDF
Demystifying Data Warehousing as a Service - DFW
PDF
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
PPTX
Snowflake Architecture.pptx
PPTX
Snowflake Datawarehouse Architecturing
Zero to Snowflake Presentation
Snowflake for Data Engineering
Snowflake essentials
Continuous Data Replication into Cloud Storage with Oracle GoldenGate
Demystifying Data Warehousing as a Service - DFW
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
Snowflake Architecture.pptx
Snowflake Datawarehouse Architecturing

What's hot (20)

PDF
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
PDF
Owning Your Own (Data) Lake House
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
PPT
An overview of snowflake
PDF
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
PDF
Oracle Database Migration to Oracle Cloud Infrastructure
PPTX
Introducing the Snowflake Computing Cloud Data Warehouse
PPTX
Azure Synapse Analytics Overview (r1)
PDF
A Thorough Comparison of Delta Lake, Iceberg and Hudi
PPTX
Demystifying Data Warehouse as a Service
PDF
Achieving Lakehouse Models with Spark 3.0
PPTX
Microsoft Azure Data Factory Hands-On Lab Overview Slides
PDF
A Cloud Journey - Move to the Oracle Cloud
PPTX
File Format Benchmark - Avro, JSON, ORC and Parquet
PDF
Snowflake Architecture
PDF
Oracle GoldenGate for Big Data 12.2 セットアップガイド
PPTX
Apache Arrow Flight Overview
PPTX
ETL in the Cloud With Microsoft Azure
PDF
Snowflake: The most cost-effective agile and scalable data warehouse ever!
PDF
本当にできるの?ミッションクリティカルシステムのクラウド移行ダイジェスト (Oracle Cloudウェビナーシリーズ: 2021年7月7日)
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Owning Your Own (Data) Lake House
Data Lakehouse, Data Mesh, and Data Fabric (r1)
An overview of snowflake
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
Oracle Database Migration to Oracle Cloud Infrastructure
Introducing the Snowflake Computing Cloud Data Warehouse
Azure Synapse Analytics Overview (r1)
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Demystifying Data Warehouse as a Service
Achieving Lakehouse Models with Spark 3.0
Microsoft Azure Data Factory Hands-On Lab Overview Slides
A Cloud Journey - Move to the Oracle Cloud
File Format Benchmark - Avro, JSON, ORC and Parquet
Snowflake Architecture
Oracle GoldenGate for Big Data 12.2 セットアップガイド
Apache Arrow Flight Overview
ETL in the Cloud With Microsoft Azure
Snowflake: The most cost-effective agile and scalable data warehouse ever!
本当にできるの?ミッションクリティカルシステムのクラウド移行ダイジェスト (Oracle Cloudウェビナーシリーズ: 2021年7月7日)
Ad

Similar to Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenplum Summit 2019 (20)

PDF
Federated Queries Across Both Different Storage Mediums and Different Data En...
PPTX
Greenplum PXF-Nov 2018
PPTX
Greenplum-PXF November 2018
PDF
Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI - Gr...
PPTX
Pivotal greenplum external tables
PDF
Moving data to the cloud BY CESAR ROJAS from Pivotal
PDF
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
PDF
Why and How to integrate Hadoop and NoSQL?
PDF
Greenplum Architecture
PDF
Actionable Insights with AI - Snowflake for Data Science
PPTX
ImpalaToGo use case
PDF
Greenplum versus redshift and actian vectorwise comparison
PPTX
Modernizing Your Data Warehouse using APS
PPTX
Not only SQL - Database Choices
PDF
Technologies for Data Analytics Platform
PDF
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
PPTX
Postgres Vision 2018: Taking Postgres Everywhere
 
PDF
AN AUTOMATED APPROACH TO CLOUD STORAGE SERVICE SELECTION.pdf
PDF
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
PDF
Present & Future of Greenplum Database A massively parallel Postgres Database...
Federated Queries Across Both Different Storage Mediums and Different Data En...
Greenplum PXF-Nov 2018
Greenplum-PXF November 2018
Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI - Gr...
Pivotal greenplum external tables
Moving data to the cloud BY CESAR ROJAS from Pivotal
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Why and How to integrate Hadoop and NoSQL?
Greenplum Architecture
Actionable Insights with AI - Snowflake for Data Science
ImpalaToGo use case
Greenplum versus redshift and actian vectorwise comparison
Modernizing Your Data Warehouse using APS
Not only SQL - Database Choices
Technologies for Data Analytics Platform
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Postgres Vision 2018: Taking Postgres Everywhere
 
AN AUTOMATED APPROACH TO CLOUD STORAGE SERVICE SELECTION.pdf
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Present & Future of Greenplum Database A massively parallel Postgres Database...
Ad

More from VMware Tanzu (20)

PDF
Spring into AI presented by Dan Vega 5/14
PDF
What AI Means For Your Product Strategy And What To Do About It
PDF
Make the Right Thing the Obvious Thing at Cardinal Health 2023
PPTX
Enhancing DevEx and Simplifying Operations at Scale
PDF
Spring Update | July 2023
PPTX
Platforms, Platform Engineering, & Platform as a Product
PPTX
Building Cloud Ready Apps
PDF
Spring Boot 3 And Beyond
PDF
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
PDF
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
PDF
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
PPTX
tanzu_developer_connect.pptx
PDF
Tanzu Virtual Developer Connect Workshop - French
PDF
Tanzu Developer Connect Workshop - English
PDF
Virtual Developer Connect Workshop - English
PDF
Tanzu Developer Connect - French
PDF
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
PDF
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
PDF
SpringOne Tour: The Influential Software Engineer
PDF
SpringOne Tour: Domain-Driven Design: Theory vs Practice
Spring into AI presented by Dan Vega 5/14
What AI Means For Your Product Strategy And What To Do About It
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Enhancing DevEx and Simplifying Operations at Scale
Spring Update | July 2023
Platforms, Platform Engineering, & Platform as a Product
Building Cloud Ready Apps
Spring Boot 3 And Beyond
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
tanzu_developer_connect.pptx
Tanzu Virtual Developer Connect Workshop - French
Tanzu Developer Connect Workshop - English
Virtual Developer Connect Workshop - English
Tanzu Developer Connect - French
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: Domain-Driven Design: Theory vs Practice

Recently uploaded (20)

PPTX
Advanced SystemCare Ultimate Crack + Portable (2025)
PPTX
Trending Python Topics for Data Visualization in 2025
PPTX
Patient Appointment Booking in Odoo with online payment
PDF
Topaz Photo AI Crack New Download (Latest 2025)
PDF
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
PPTX
Computer Software - Technology and Livelihood Education
DOCX
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
PPTX
Tech Workshop Escape Room Tech Workshop
DOCX
How to Use SharePoint as an ISO-Compliant Document Management System
PDF
MCP Security Tutorial - Beginner to Advanced
PPTX
Cybersecurity: Protecting the Digital World
PPTX
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
PPTX
Why Generative AI is the Future of Content, Code & Creativity?
PDF
Website Design Services for Small Businesses.pdf
PDF
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PPTX
Monitoring Stack: Grafana, Loki & Promtail
PDF
Top 10 Software Development Trends to Watch in 2025 🚀.pdf
PDF
Time Tracking Features That Teams and Organizations Actually Need
PPTX
assetexplorer- product-overview - presentation
Advanced SystemCare Ultimate Crack + Portable (2025)
Trending Python Topics for Data Visualization in 2025
Patient Appointment Booking in Odoo with online payment
Topaz Photo AI Crack New Download (Latest 2025)
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
Computer Software - Technology and Livelihood Education
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
Tech Workshop Escape Room Tech Workshop
How to Use SharePoint as an ISO-Compliant Document Management System
MCP Security Tutorial - Beginner to Advanced
Cybersecurity: Protecting the Digital World
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
Why Generative AI is the Future of Content, Code & Creativity?
Website Design Services for Small Businesses.pdf
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Monitoring Stack: Grafana, Loki & Promtail
Top 10 Software Development Trends to Watch in 2025 🚀.pdf
Time Tracking Features That Teams and Organizations Actually Need
assetexplorer- product-overview - presentation

Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenplum Summit 2019

  • 2. © Copyright 2019 Pivotal Software, Inc. All rights Reserved. Shivram Mani Francisco Guerrero @shivram @frankgh Maximize Greenplum For Any Use Cases Decoupling Compute and Storage
  • 3. Cover w/ Image Agenda ■ Enterprise Data Landscape ■ Accessing External Data from Greenplum ■ Platform Extension Framework (PXF) ■ Use Cases ■ Q+A
  • 4. © Copyright 2019 Pivotal Software, Inc. All rights Reserved.© Copyright 2019 Pivotal Software, Inc. All rights Reserved. Enterprise Data Landscape
  • 5. The Wild Wild West of Data ? 5
  • 6. Greenplum uses PXF as a federated query engine to access external heterogeneous data.
  • 7. Platform Extension Framework (PXF) Tabular view for heterogeneous data Built-in connectors for various data sources/formats Pluggable framework Parallel high throughput data access Open source Read and write external data 7
  • 8. Architecture of PXF Master Host External Data Segment Host 1 seg1 seg2 seg3 PXF Segment Host 2 seg4 seg5 seg6 PXF 8
  • 9. Q: How can I access sales data residing in an S3 bucket stored in parquet format? Greenplum External Table CREATE EXTERNAL TABLE sales (cust int, sku text, amount decimal, date date) LOCATION ('pxf://s3-bucket/2018/sales/?PROFILE=s3:parquet&SERVER=s3_sales') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import') profilepath to data server 9
  • 10. How can we scale performance when querying remote data ?
  • 11. Performance - Predicate Pushdown state=NY state=NJ state=CA state=CA {state='CA'} SELECT item, amount FROM orders WHERE state = 'CA'MASTER SEGMENT predicates : state=CA PXF with JDBC Row oriented storage format ● Predicate information pushed to external system ● External engines can support predicates for its own queries (e.g. JDBC) ● No filtering within PXF itself ● Partition pruning (e.g. Hive)
  • 12. Performance - Column Projection date: {item:, amount:, state='CA'} SELECT item, amount FROM orders WHERE state = 'CA'MASTER SEGMENT columns : item, amount predicates : state=CA aggregates : count PXF with Hive/ORC Columnar storage format ● Propagate columns projection metadata to external systems ● JDBC, Parquet & ORC ● Reduces Network I/O ● Reduces Remote Disk I/O ● Improved performance for aggregate queries state: amount: item:
  • 13. © Copyright 2019 Pivotal Software, Inc. All rights Reserved.© Copyright 2019 Pivotal Software, Inc. All rights Reserved. Use Cases
  • 14. Use Case: Multi-temperature data querying ● Storage based on operational requirements ● Can I work with data created few second ago ? ● Can I run a report on data from few days ago ? ● Can I inspect the data archived months or years ago ? In-Memory Database RDBMS dataData Lake HOT DATA WARM DATA COLD DATA 14
  • 15. Use Case: Elastic scaling with Greenplum ● Greenplum on K8s for elastic compute ● Elastic storage with S3/Azure/Google ● Ability to separate compute from storage ● On-demand data warehouses 15
  • 16. Use Case: Access Heterogenous data on multiple clouds ● Different cloud providers based on business requirements ● Low cost storage ● No storage admin ● Data doesn’t need to be copied 16
  • 17. Use Case: Access Heterogenous data on multiple clouds Historical_Orders xx xx xx xx Historical_Invoices xx xx xx xx Product_Catalog xx xx xx xx Historical_Orders xx xx xx xx Admin migrates data from s3- bucket-orders to Azure Blob Storage SELECT * FROM historical_orders o, product_catalog p WHERE o.product_id = p.product_id s3-bucket-orders s3-bucket-price Historical_Invoices xx xx xx xx 17 SELECT * FROM historical_orders o, product_catalog p WHERE o.product_id = p.product_id
  • 18. Use Case: Access Heterogenous data on multiple cloud CREATE EXTERNAL TABLE historical_orders (item int, amount money) LOCATION ('pxf://s3-bucket-orders/path?PROFILE=s3:parquet&SERVER=s3_orders') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'); CREATE EXTERNAL TABLE historical_orders (item int, amount money) LOCATION ('pxf://my.azuredatalakestore.net/path?PROFILE=adl:parquet&SERVER=azure') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'); Historical_orders table data on S3 Historical_orders table data now on Azure Data Lake 18
  • 19. Summary Greenplum embraces the modern data landscape ● Scale and manage compute independently from storage ● Federate queries across heterogeneous data sources ● Cloud Agnostic Data is available for analytics with Greenplum no matter its form and where it resides! 19
  • 20. #ScaleMatters © Copyright 2019 Pivotal Software, Inc. All rights Reserved.
  • 21. Cover w/ Image Greenplum External Table Define an external table with the following: ● the schema of the external data ● the protocol pxf ● the location of the data in an external system ● the profile to identify the specific connector ● The compressions_codec of the data ● the format of the external data CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name ( col_name data_type [,...] | LIKE other_table ) LOCATION ('pxf://<path to data>? PROFILE=[<profile_name>|<data_store:data_type>]& COMPRESSIONG_CODEC=[snappy|gzip|lzo|bzip2]& [&<CUSTOM_OPTIONS>=<value>[...]]’) FORMAT '[TEXT|CSV|CUSTOM]' cust, sku, amount, date 1234, ABC, $9.90, 4/01 1235, CDE, $8.80, 3/30 CREATE EXTERNAL TABLE sales (cust int, sku text, amount decimal, date date) LOCATION ('pxf:///2018/sales.csv?PROFILE=hdfs:text') FORMAT 'TEXT'
  • 22. Cover w/ Image PXF supports accessing multiple external datastores simultaneously ● server identifies an external datastore ● Staging directory server/ under ${PXF_CONF} ● Contains relevant configuration files under servers/{server_name}/ ○ HDFS: core-site.xml, hdfs-site.xml, ... ○ S3: s3-site.xml containing access properties PXF Multi Server CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name ( col_name data_type [,...] | LIKE other_table ) LOCATION ('pxf://<path to data>? PROFILE=<data_store:data_type>& SERVER=<server_name>’) CREATE EXTERNAL TABLE sales (cust int, sku text, amount decimal, date date) LOCATION ('pxf://s3-bucket- sales/2018/sales.csv?PROFILE=s3:text&server=s3_s ales’) FORMAT 'TEXT' cust, sku, amount, date 1234, ABC, $9.90, 4/01 1235, CDE, $8.80, 3/30
  • 23. Performance in PXF ● Parallel access to data ● Predicate pushdown ● Column projection 23 SELECT item, amount WHERE state = 'CA' column projection predicate pushdown
  • 24. Performance in PXF ● Parallel access to data ● Column Projection ● Predicate Pushdown 24 SELECT item, amount WHERE state = 'CA' column projection predicate pushdown