SlideShare a Scribd company logo
Introduction to
Amazon Redshift
May, 2014
/Abdullah Cetin CAVDAR @accavdar
What's Amazon Redshift?
Amazon Redshift is a fast and powerful, fully
managed, petabyte-scale data warehouse service in
the cloud
https://aws.amazon.com/redshift/
Features
Petabyte scale, massively parallel
Relational data warehouse
Fully managed, zero admin
SSD and HDD platforms
$999/TB/Year
Architecture
Client Applications
Integrates with various data loading and ETL (Extract, Transform, and
Load) tools and business intelligence (BI) reporting, data mining, and
analytics tools
Redshift is based on industry-standard PostgreSQL, so most existing
SQL client applications will work with only minimal changes
Connections
Redshift communicates with client applications by using industry-
standard PostgreSQL JDBC and ODBC drivers
Clusters
A cluster is composed of one or more compute nodes
Leader Node coordinates the compute nodes and handles external
communication
Leader Node
Manage communications with client programs and communications
with compute nodes
Store metadata
Coordinate query execution
Compute Nodes
Execute the compiled code, send intermediate results back to the
leader node for final aggregation
It has own dedicated CPU, memory, and attached disk storage, which
are determined by the node type
Databases
A cluster contains one or more databases
User data is stored on the compute nodes
Amazon Redshift is a Relational Database Management System
(RDBMS)
Amazon Redshift is optimized for high-performance analysis and
reporting of very large datasets
Amazon Redshift is based on PostgreSQL
Redshift reduces I/O
Column storage - read data you need
Data compression - analyzes and compress your data
Zone Map
Keep track of minimum and maximum value for each block
Skip over blocks that don't contain data needed for a given query
Minimize unnecessary I/O
Direct attached storage
Hardware optimized for high performance data processing
Large data block sizes
Large block sizes to make the most of each read
Redshift runs on optimized
hardware
Optimized for I/O intensive workloads
High disk density
Runs in HPC - fast network
Redshift parallelizes and
distributes everything
Query
Load
Backup/Restore
Resize
Redshift is easy to use
Provision in minutes
Monitor query performance
Point and click resize
Built in security
Automatic backups
Redshift has security built-in
SSL to secure data in transit
Encryption to secure data at rest
AES 256 - hardware accelerated
All blocks on disk and in Amazon S3 encrypted
No direct access to compute nodes
Amazon VPC support
Redshift backs up your data
and recovers from failures
Replication within the cluster and backup to Amazon S3
Backup to Amazon S3 are continuous, automatic and incremental
Continuous monitoring and automated recovery from failures
Able to restore snapshots to any Availability Zone
Use Cases
Traditional Enterprise DW
Reduce costs by extending DW rather than adding HW
Migrate completely from existing DW systems
Respond faster to business
Companies with Big Data
Improve performance by an order of magnitude
Make more data available for analysis
Access business data via standard reporting tools
SaaS Companies
Add analytic functionality to applications
Scale DW capacity as demand grows
Reduce HW and SW costs by an order of magnitude
 Use Caseskillpages
Data Architecture
Redshift Implementation
High Storage Extra Large (XL) DW Node
ETL Activities
Approx. 90 minutes including exports from RDBMS, copying to S3,
loading stage tables, loading target tables, vacuuming and
analysing tables
Schema
Compression
Retention
DW Anatomy
Why Redshift works for
SkillPages?
Scale - MPP
Performance - Columnar data access and compression
Platform Integration - S3, Dynamo
Operational Advantages
Ease of Access
Cost
Best Practices
Avoid large number of singleton Data Manipulation Language (DML)
statements if possible
Use COPY for uploading large datasets
Choose SORT and DISTRIBUTION keys with care
Encode data and time with TIMESTAMP data type
Experiment with WLM (Workload Manager) settings
Slides
https://github.com/accavdar/AmazonRedshift
THE END
by Abdullah Cetin CAVDAR / @accavdar

More Related Content

PDF
Deploying ETL to Cloud
PPTX
Microsoft Build 2018 Analytic Solutions with Azure Data Factory and Azure SQL...
PPT
Running your database in the cloud presentation
PPTX
SharePoint User Group - Leeds - 2015-09-02
PPTX
Ironically, Infrastructure Doesn't Matter - Quinton Anderson, Commonwealth Ba...
PDF
Deliver Your Modern Data Warehouse (Microsoft Tech Summit Oslo 2018)
PPTX
Microsoft Azure Data Warehouse Overview
PDF
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Deploying ETL to Cloud
Microsoft Build 2018 Analytic Solutions with Azure Data Factory and Azure SQL...
Running your database in the cloud presentation
SharePoint User Group - Leeds - 2015-09-02
Ironically, Infrastructure Doesn't Matter - Quinton Anderson, Commonwealth Ba...
Deliver Your Modern Data Warehouse (Microsoft Tech Summit Oslo 2018)
Microsoft Azure Data Warehouse Overview
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...

What's hot (7)

PPT
Running your database in the cloud presentation
PPTX
Lecture1
PPTX
Azure Cosmos DB Pricing 101 Infographic
PPTX
NoSQL Migration Technical Pitch Deck
PDF
Big data on AWS
PPT
Co 4, session 2, aws analytics services
PPTX
Super charged prototyping
Running your database in the cloud presentation
Lecture1
Azure Cosmos DB Pricing 101 Infographic
NoSQL Migration Technical Pitch Deck
Big data on AWS
Co 4, session 2, aws analytics services
Super charged prototyping
Ad

Similar to Introduction to Amazon Redshift (18)

PDF
AWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif Abbasi
PPTX
Azure Data Factory ETL Patterns in the Cloud
PPTX
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
PPTX
AWS-DMS-2023.pptx
PPTX
What is Amazon Redshift?
PPTX
Introducing Azure SQL Data Warehouse
PPTX
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
PPTX
Microsoft SQL Server - Reduce Your Cost and Improve your Agility Presentation
PDF
Amf304 optimizing-design-and-e-660cc73d-5c4c-4331-8f59-48cccdc1b7f4-135588426...
PPTX
Streaming Real-time Data to Azure Data Lake Storage Gen 2
PDF
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
PDF
Azure SQL Database
PPTX
Benefits of the Azure cloud
PDF
Keynote sp summit 2014 final
PDF
Aws Data Engineer Training | Aws Data Engineer Course
PDF
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
PDF
Azure SQL Database Managed Instance - technical overview
PDF
AWS tutorial-Part59:AWS Cloud Database Products-2nd Intro Session
AWS reinvent 2019 recap - Riyadh - Database and Analytics - Assif Abbasi
Azure Data Factory ETL Patterns in the Cloud
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
AWS-DMS-2023.pptx
What is Amazon Redshift?
Introducing Azure SQL Data Warehouse
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
Microsoft SQL Server - Reduce Your Cost and Improve your Agility Presentation
Amf304 optimizing-design-and-e-660cc73d-5c4c-4331-8f59-48cccdc1b7f4-135588426...
Streaming Real-time Data to Azure Data Lake Storage Gen 2
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
Azure SQL Database
Benefits of the Azure cloud
Keynote sp summit 2014 final
Aws Data Engineer Training | Aws Data Engineer Course
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Azure SQL Database Managed Instance - technical overview
AWS tutorial-Part59:AWS Cloud Database Products-2nd Intro Session
Ad

More from Abdullah Çetin ÇAVDAR (6)

PDF
Apache Spark 101
PDF
Big Data Tech Stack
PDF
PDF
Django Best Practices
PDF
Internet of Things (IoT) and Google
PDF
Multi Screen Hell
Apache Spark 101
Big Data Tech Stack
Django Best Practices
Internet of Things (IoT) and Google
Multi Screen Hell

Recently uploaded (20)

PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Cloud computing and distributed systems.
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPT
Teaching material agriculture food technology
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
cuic standard and advanced reporting.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Machine learning based COVID-19 study performance prediction
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Encapsulation_ Review paper, used for researhc scholars
Reach Out and Touch Someone: Haptics and Empathic Computing
Cloud computing and distributed systems.
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Unlocking AI with Model Context Protocol (MCP)
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Teaching material agriculture food technology
MYSQL Presentation for SQL database connectivity
Mobile App Security Testing_ A Comprehensive Guide.pdf
cuic standard and advanced reporting.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Machine learning based COVID-19 study performance prediction
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
The AUB Centre for AI in Media Proposal.docx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Encapsulation_ Review paper, used for researhc scholars

Introduction to Amazon Redshift