SlideShare a Scribd company logo
Build multi-region Data Warehouse
on AWS
Mr. Vuong Tran
Cloud Solution Architect
OSAM
#1
Mr. Thai Ngo
Cloud Solution Architect
OSAM
AWS User Group Vietnam
Build multi region data warehouse on AWS - AWSVNUG
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Agenda
1. What is Data Warehouse?
2. Benefit of using AWS for Data Warehousing
3. Why Multi-region Data Warehouse?
4. AWS Architecture
What is the Data Warehouse?
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
What is the Data Warehouse
• DWs are central repositories of integrated data from one or more
disparate sources so it can be compared and analyzed for greater
business intelligence.
• DWs is known as a blend of technologies and components which allows
the strategic use of data.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Data Warehouse using for?
• Reporting
• Data Analysis
• Core component of business intelligence
Benefits of using AWS for Data Warehouse
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Global Infrastructure
19 regions in the world
Expanding more and more:
Bahrain, Cape Town, Hong
Kong SAR and Milan
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
What are the main components of data
warehouse on AWS?
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Benefits of using
AWS for Data
Warehouse
• Better decision making
• Consolidates data from many sources
• Data quality, consistency, and
accuracy
• Historical intelligence
• Separates analytics processing from
transactional databases, improving
performance of both systems
Why Multi-region Data Warehouse ?
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Global business
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Location
● Stores in different
regions
● Different networks
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Data Process
Be processed separately
in different ways in different
regions
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Data Aggregation
Be aggregated across regions
for data mining and business
analytic
AWS Architecture
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
From the beginning
Single Region
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Redshift
Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the
cloud. You can start with just a few hundred gigabytes of data and scale to a
petabyte or more. This enables you to use your data to acquire new insights for your
business and customers
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Amazon RedShift
• Scale from 160GB to 2PB online
• Automatic streaming
backup/restore to S3
• Automatic failover and recovery
• ANSI SQL interface
• Load data from S3, DynamoDB
and EMR
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue
Automatically discover and categorize your data making it
immediately searchable and queryable across data sources
Generate code to clean, enrich, and reliably move data between
various data sources; you can also use their favorite tools to build
ETL jobs
Run your jobs on a serverless, fully managed, scale-out
environment. No compute resources to provision or manage.
Discovery
Develop
Deploy
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS S3
Secure, durable, highly-scalable object storage
Accessible via a simple web services interface
Store & retrieve any amount of data
Integrate with many AWS services
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3
Backup & Archiving
Content Storage &
Distribution
Big Data Analytics
Static Website Hosting
Cloud-native Application Data
Disaster Recovery
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Then expanding
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Multi Regions Architecture
- Route 53: Geolocation Routing
- DW Redshift separately
- Glue jobs differences between regions
- Data is placed separately among regions
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Multi Regions Architecture
What about the Data from other regions?
How can we manage and access data from other regions in secure way?
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
We can do like this
Rely on BI tools features
- Support multiple sources
- Support aggregation from
multiple sources
Challenge
- Latency from sources
- Customize data from other
regions
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Or like this
Glue
- Send data to multiple DW
endpoints per job
Challenge
- Managed glue jobs across
region
- Complex private networks
with VPC peering when using
Glue in VPC
Better solution
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Architecture Diagram
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pros
- Replicate by AWS
- Choose partial data to replicate by S3 prefix
- Simplify the replication configure multi regions by using Cloud
Formation
S3 with cross-regions replication
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pros
- Maximize the ability to process and mine data from other regions
- Focus only processing data, not sending data to many endpoints
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pros
- No complex network, no VPCs peering
- Isolate ETL process between regions, different teams
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pros
- Data is prepared completely on each Redshift per region
- Reduce latency from BI tools to Redshift
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cons
Redundant data
- Replicate only what we need
- Apply S3 object life cycle
- Leverage Lambda to clean up Redshift data
NGO QUOC THAI
Cloud Solution Architect
OSAM
TRAN LE VUONG
Cloud Solution Architect
OSAM
Thank you

More Related Content

PPTX
Data Exchange talk AWSVNUG
PPTX
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
PPTX
Big data at zulily
PDF
Hadoop Data Warehouse
PPTX
Providing Interactive Analytics on Excel with Billions of Rows
PPT
The new mainframe
PPT
Copy of the new mainframe
PPT
Tap Model
Data Exchange talk AWSVNUG
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
Big data at zulily
Hadoop Data Warehouse
Providing Interactive Analytics on Excel with Billions of Rows
The new mainframe
Copy of the new mainframe
Tap Model

What's hot (20)

PPT
Cloud Crowd - Mobile Sync Cloud
PDF
Immersion Day - Democratize o acesso ao dado
PPTX
TDWI Schweiz 2019 - Building Streaming Data Warehouses
PPTX
Reach New Heights with Amazon Redshift
PPT
Replicator 4 22 - Technical presentation
PDF
Datahive 360 - Felipe Wesbonk
PPTX
Create Powerful Reports Using Data Visualization With Quick BI
PPTX
SAP MDG PRESENTATION
PPTX
SAP migration and integration success
PDF
Using FlockData to power your Recommendation Engine
PDF
Building the Ideal Stack for Machine Learning
PDF
SAP Cloud for Energy Series Part 1
PDF
IBM InfoServer Event Edmonton May 23 2013
PDF
Products for SAP NetWeaver
PDF
Achieving Massive Concurrency & Sub-second Query Latency on Cloud Warehouses ...
PDF
AWS Community Day Nordics 2018 - Aino Health: Transition to serverless and le...
PDF
Elevate 2017 - Expo: Redefining distribution (again): Facilitate your integra...
PPTX
Summary of all tools and microsoft power bi
PPTX
Data Technology Platform @ RueLaLa.com
PDF
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
Cloud Crowd - Mobile Sync Cloud
Immersion Day - Democratize o acesso ao dado
TDWI Schweiz 2019 - Building Streaming Data Warehouses
Reach New Heights with Amazon Redshift
Replicator 4 22 - Technical presentation
Datahive 360 - Felipe Wesbonk
Create Powerful Reports Using Data Visualization With Quick BI
SAP MDG PRESENTATION
SAP migration and integration success
Using FlockData to power your Recommendation Engine
Building the Ideal Stack for Machine Learning
SAP Cloud for Energy Series Part 1
IBM InfoServer Event Edmonton May 23 2013
Products for SAP NetWeaver
Achieving Massive Concurrency & Sub-second Query Latency on Cloud Warehouses ...
AWS Community Day Nordics 2018 - Aino Health: Transition to serverless and le...
Elevate 2017 - Expo: Redefining distribution (again): Facilitate your integra...
Summary of all tools and microsoft power bi
Data Technology Platform @ RueLaLa.com
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
Ad

Similar to Build multi region data warehouse on AWS - AWSVNUG (8)

PDF
Well Archictecture Framework dotNET.pdf
PDF
2. migration, disaster recovery and business continuity in the cloud
PDF
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
PDF
[AWS Media Symposium 2019] Perfecting the Media Experience with AWS - Bhavik ...
PDF
Application and database migration workshop
PDF
AWS Outposts Update
PPTX
AWSome Day Brasil - Março 2020
PPTX
AWSome Day Brasil - Junho 2020
Well Archictecture Framework dotNET.pdf
2. migration, disaster recovery and business continuity in the cloud
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
[AWS Media Symposium 2019] Perfecting the Media Experience with AWS - Bhavik ...
Application and database migration workshop
AWS Outposts Update
AWSome Day Brasil - Março 2020
AWSome Day Brasil - Junho 2020
Ad

More from AWS Vietnam Community (20)

PDF
Growth journey 2018 AWSVN
PPTX
Re invent 2018 top 15 launch announcements
PPTX
Vietnam AWS Community Day 2018
PPTX
Series Meetup #1: Speech 2: Elastic beanstalk
PPTX
Series Meetup #1: Speech 1: Computing
PDF
Build an app on aws for your first 10 million users (2)
PDF
Vn introduction to cloud computing with amazon web services
PDF
Meetup#7: AWS LightSail - The Simplicity of VPS - The Power of AWS
PDF
Meetup#6: AWS-AI & Lambda Serverless
PPTX
Cloud Solution Day 2016: Service Mesh for Kubernetes
PDF
Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS
PPTX
Cloudsolutionday 2016: How to build a "zero-downtime" web application
PPTX
Cloudsolutionday 2016: Docker & FAAS at getvero.com
PDF
Cloudsolutionday 2016: DevOps workflow with Docker on AWS
PPTX
Cloudsolutionday 2016: Getting Started with Severless Architecture
PPTX
Cloudsolutionday 2016: Opening Remarks
PPTX
Cloudsolutionday 2016: Compliance and cost controlling on AWS
PDF
Meetup #4: AWS ELB Deep dive & Best practices
PPTX
Meetup #3: Migrating an Oracle Application from on-premise to AWS
PPTX
Meetup #3: Migrate a fast scale system to AWS
Growth journey 2018 AWSVN
Re invent 2018 top 15 launch announcements
Vietnam AWS Community Day 2018
Series Meetup #1: Speech 2: Elastic beanstalk
Series Meetup #1: Speech 1: Computing
Build an app on aws for your first 10 million users (2)
Vn introduction to cloud computing with amazon web services
Meetup#7: AWS LightSail - The Simplicity of VPS - The Power of AWS
Meetup#6: AWS-AI & Lambda Serverless
Cloud Solution Day 2016: Service Mesh for Kubernetes
Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS
Cloudsolutionday 2016: How to build a "zero-downtime" web application
Cloudsolutionday 2016: Docker & FAAS at getvero.com
Cloudsolutionday 2016: DevOps workflow with Docker on AWS
Cloudsolutionday 2016: Getting Started with Severless Architecture
Cloudsolutionday 2016: Opening Remarks
Cloudsolutionday 2016: Compliance and cost controlling on AWS
Meetup #4: AWS ELB Deep dive & Best practices
Meetup #3: Migrating an Oracle Application from on-premise to AWS
Meetup #3: Migrate a fast scale system to AWS

Recently uploaded (20)

PPTX
Big Data Technologies - Introduction.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Approach and Philosophy of On baking technology
PDF
KodekX | Application Modernization Development
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Cloud computing and distributed systems.
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
A Presentation on Artificial Intelligence
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Electronic commerce courselecture one. Pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPT
Teaching material agriculture food technology
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
Big Data Technologies - Introduction.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Approach and Philosophy of On baking technology
KodekX | Application Modernization Development
Review of recent advances in non-invasive hemoglobin estimation
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Cloud computing and distributed systems.
Advanced methodologies resolving dimensionality complications for autism neur...
NewMind AI Weekly Chronicles - August'25 Week I
A Presentation on Artificial Intelligence
Reach Out and Touch Someone: Haptics and Empathic Computing
Electronic commerce courselecture one. Pdf
20250228 LYD VKU AI Blended-Learning.pptx
Teaching material agriculture food technology
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Chapter 3 Spatial Domain Image Processing.pdf

Build multi region data warehouse on AWS - AWSVNUG

  • 1. Build multi-region Data Warehouse on AWS Mr. Vuong Tran Cloud Solution Architect OSAM #1 Mr. Thai Ngo Cloud Solution Architect OSAM AWS User Group Vietnam
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT Agenda 1. What is Data Warehouse? 2. Benefit of using AWS for Data Warehousing 3. Why Multi-region Data Warehouse? 4. AWS Architecture
  • 4. What is the Data Warehouse?
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT What is the Data Warehouse • DWs are central repositories of integrated data from one or more disparate sources so it can be compared and analyzed for greater business intelligence. • DWs is known as a blend of technologies and components which allows the strategic use of data.
  • 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT Data Warehouse using for? • Reporting • Data Analysis • Core component of business intelligence
  • 7. Benefits of using AWS for Data Warehouse
  • 8. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Global Infrastructure 19 regions in the world Expanding more and more: Bahrain, Cape Town, Hong Kong SAR and Milan
  • 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT What are the main components of data warehouse on AWS?
  • 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT Benefits of using AWS for Data Warehouse • Better decision making • Consolidates data from many sources • Data quality, consistency, and accuracy • Historical intelligence • Separates analytics processing from transactional databases, improving performance of both systems
  • 11. Why Multi-region Data Warehouse ?
  • 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT Global business
  • 13. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Location ● Stores in different regions ● Different networks
  • 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT Data Process Be processed separately in different ways in different regions
  • 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT Data Aggregation Be aggregated across regions for data mining and business analytic
  • 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT From the beginning Single Region
  • 18. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Redshift Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. You can start with just a few hundred gigabytes of data and scale to a petabyte or more. This enables you to use your data to acquire new insights for your business and customers
  • 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT Amazon RedShift • Scale from 160GB to 2PB online • Automatic streaming backup/restore to S3 • Automatic failover and recovery • ANSI SQL interface • Load data from S3, DynamoDB and EMR
  • 20. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue Automatically discover and categorize your data making it immediately searchable and queryable across data sources Generate code to clean, enrich, and reliably move data between various data sources; you can also use their favorite tools to build ETL jobs Run your jobs on a serverless, fully managed, scale-out environment. No compute resources to provision or manage. Discovery Develop Deploy
  • 21. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS S3 Secure, durable, highly-scalable object storage Accessible via a simple web services interface Store & retrieve any amount of data Integrate with many AWS services
  • 22. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3 Backup & Archiving Content Storage & Distribution Big Data Analytics Static Website Hosting Cloud-native Application Data Disaster Recovery
  • 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT Then expanding
  • 24. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Multi Regions Architecture - Route 53: Geolocation Routing - DW Redshift separately - Glue jobs differences between regions - Data is placed separately among regions
  • 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT Multi Regions Architecture What about the Data from other regions? How can we manage and access data from other regions in secure way?
  • 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT We can do like this Rely on BI tools features - Support multiple sources - Support aggregation from multiple sources Challenge - Latency from sources - Customize data from other regions
  • 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT Or like this Glue - Send data to multiple DW endpoints per job Challenge - Managed glue jobs across region - Complex private networks with VPC peering when using Glue in VPC
  • 29. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Architecture Diagram
  • 30. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pros - Replicate by AWS - Choose partial data to replicate by S3 prefix - Simplify the replication configure multi regions by using Cloud Formation S3 with cross-regions replication
  • 31. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pros - Maximize the ability to process and mine data from other regions - Focus only processing data, not sending data to many endpoints
  • 32. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pros - No complex network, no VPCs peering - Isolate ETL process between regions, different teams
  • 33. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pros - Data is prepared completely on each Redshift per region - Reduce latency from BI tools to Redshift
  • 34. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cons Redundant data - Replicate only what we need - Apply S3 object life cycle - Leverage Lambda to clean up Redshift data
  • 35. NGO QUOC THAI Cloud Solution Architect OSAM TRAN LE VUONG Cloud Solution Architect OSAM Thank you