SlideShare a Scribd company logo
SQL Server To Redshift
Data Load Using SSIS
Reach for the Clouds, Inc.
Next Generation SSIS Tasks and Connectors Series
AUTHOR:
NAYAN PATEL | SR. ETL SSIS ARCHITECT
N PAT E L @ R F TC LO U D S . C O M
Content
• Introduction – SQL Server to Redshift Load
• VideoTutorial – Redshift Data Load
• Right way but hard way
• Steps for Amazon Redshift Data Load from On-Premise files or RDBMS (e.g. MySQL, SQL Server)
• Doing it easy way
• Should I use SSIS to load Redshift
• Setup your Amazon Redshift Cluster
• Add inbound rule for Redshift Cluster
• Automate Redshift Cluster Creation
• Create Sample table and data in Source – (in this example SQL Server)
• Create Sample table in Amazon Redshift
• SQL Server to Redshift Data Load using SSIS
• Conclusion - Related Links
Introduction – SQL Server to Redshift Load
• Before we talk data load from SQL Server to Redshift using SSIS lets
talk what is Amazon Redshift (or sometimes referred to as AWS
Redshift). Amazon Redshift is a Cloud based Data warehouse service.
This type of system also referred as MPP (Massively Parallel
Processing). Amazon Redshift uses highly modified version of
PostGrey SQL Engine behind the scene. Amazon Redshift provides
advantage of Scale as you go, at very low cost compared to onsite
dedicated hardware/software approach.
Right way but hard way
• If you are reading some of the guidelines published by Amazon
regarding Redshift Data load then you will quickly realize that there is
a lot to do under the cover to get it going right way. Here are few
steps you will have to perform while loading data to Redshift from
your On-Premise server (Data can be sitting in files or Relational
source).
Right way but hard way
Steps for Amazon Redshift Data Load from On-Premise files or RDBMS (e.g. MySQL, SQL Server)
• Export local RDBMS data to flat files (Make sure you remove invalid characters, apply escape sequence
during export)
• Split files into 10-15 MB each to get optimal performance during upload and final Data load
• Compress files to *.gz format so you don’t end up with $1000 surprise bill :) .. In my caseText files were
compressed 10-20 times
• List all file names to manifest file so when you issue COPY command to Redshift its treated as one unit
of load
• Upload manifest file to Amazon S3 bucket
• Upload local *.gz files to Amazon S3 bucket
• Issue RedshiftCOPY command with different options
• Schedule file archiving from on-premises and S3 Staging area on AWS
• Capturing Errors, setting up restart ability if something fails
Doing it the easy way
• So if you are not sure you ready to code many steps listed above then
you can use Amazon Redshift DataTransferTask.
• In next few sections we will describe how to setup your Redshift
Cluster for Demo purpose and load Data from SQL Server to Redshift
using SSIS.
Doing it the easy way
Should I use SSIS to load Redshift
• If you are curious which approach to use to load data then consider few facts
• Do you have existing ETL processes written in SSIS?
• Do you need more visual approach and better work flow management (what SSIS
Provides)?
• Do you need connection string encryption and other goodies offered by SSIS such
as native logging, passing parameters from SSIS environment
• Do you have expertise available for SSIS in-house or you better stay with command
line scripts?
• Do you have need to create workflow which can run on any server where SSIS is not
installed?
Setup your Amazon Redshift
Cluster
NOTE: SKIP THIS STEP IF YOU ALREADY SETUP
YOU REDSHIFT CLUSTER
1.LOGIN TO YOUR AWS CONSOLE
AND CLICK ON REDSHIFT ICON. OR
CLICK HERE TO LAND DIRECTLY TO
REDSHIFT
2.CLICK ON LAUNCH CLUSTER
3.ON CLUSTER DETAIL PAGE SPECIFY
CLUSTER IDENTIFIER, DATABASE
NAME, PORT, MASTER USER AND
PASSWORD. CLICK CONTINUE TO GO
TO NEXT PAGE
Setup your Amazon Redshift
Cluster
4. ON NODE CONFIGURATION PAGE
SPECIFY NODE TYPE (THIS IS VM
TYPE), CLUSTER TYPE AND NUMBER
OF NODE. IF YOU ARE TRYING UNDER
FREE TIRE THEN SELECT SMALLEST
NODE POSSIBLE (IN THIS CASE IT
WAS DW2.LARGE). CLICK CONTINUE
TO GO TO NEXT PAGE
Setup your Amazon Redshift
Cluster
5. ON ADDITIONAL CONFIGURATION
PAGE YOU CAN PICK VPC (VIRTUAL
PRIVATE CONNECTION), SECURITY
GROUP FOR CLUSTER AND OTHER
OPTIONS FOR ENCRYPTION. FOR
DEMO PURPOSE SELECT AS BELOW
SCREENSHOT . CLICK CONTINUE TO
REVIEW YOUR SETTINGS AND CLICK
CREATE CLUSTER
Setup your Amazon Redshift
Cluster
6. GIVE IT FEW MINS WHILE YOUR
CLUSTER IS BEING CREATED. AFTER
FEW MINUTES (5-10 MINS) YOU CAN
GO BACK TO SAME PAGE AND
REVIEW CLUSTER STATUS AND
OTHER PROPERTIES AS BELOW.
COPY CLUSTER ENDPOINT TO
SOMEWHERE BECAUSE WE WILL
NEED IT LATER.
Add inbound rule for Redshift
Cluster
NOTE: SKIP THIS STEP IF YOU HAVE ALREADY
ADDED YOUR IP TO INBOUND EXCLUSION RULE.
BY DEFAULT YOU CANNOT CONNECT TO AMAZON
REDSHIFT CLUSTER FROM OUTSIDE AWS
NETWORK (E.G. FROM YOUR ON -PREMISES
MACHINE). IF YOU WISH TO CONNECT THEN YOU
MUST ADD INBOUND EXCEPTION RULE TO ALLOW
YOUR REQUEST TO REDSHIFT CLUSTER ON
SPECIFIC PORT.
TO ADD CREATE NEW INBOUND RULE PERFORM
FOLLOWING STEPS
1. UNDER REDSHIFT HOME PAGE
CLICK [SECURITY] TAB. YOU MAY SEE
FOLLOWING NOTICE DEPENDING ON
WHICH REGION YOU ARE. CLICK ON
[GO TO THE EC2 CONSOLE] LINK OR
YOU CAN DIRECT GO TO EC2 BY
CLICKING SERVICES -> EC2 MENU AT
THE TOP
Add inbound rule for Redshift
Cluster
2. ON EC2 SECURITY GROUPS PAGE
SELECT SECURITY GROUP ATTACHED
WITH YOUR REDSHIFT CLUSTER AND
THEN IN THE BOTTOM PANE CLICK
ON INBOUND TAB
3. ON INBOUND TAB CLICK EDIT
OPTION TO MODIFY DEFAULT ENTRY
OR YOU CAN ADD NEW RULE
4. CLICK ON ADD RULE IF YOU WISH
TO ADD NEW ENTRY ELSE EDIT AS
BELOW AND CLICK SAVE
Automate Redshift Cluster Creation
If you have need to automate Redshift Cluster Creation or any of the following things
automatically then check Redshift Cluster managementTask
• Automate Amazon Redshift Cluster Create Action in few clicks.You can also add
Access Security Rule.
• Automate Amazon Redshift Cluster Delete Action
• Fetch Amazon Redshift Cluster Property to SSISVariable (e.g. Fetch Cluster Status)
• Fetch all cluster and their properties as DataTable (Use ForEach Loop and iterate
through all clusters)
• Automate Redshift Cluster Snapshot Creation
• Automate Redshift Cluster Snapshot Delete Action
• Support forWait until Cluster operation is done
Create Sample table and data in Source – (in this
example SQL Server)
Note: Skip this step if you wish to use your own table. If you do so please ignore certain steps and
screenshots mentioned in this article.
For this demo we will use Free Northwind sample database
supplied by Microsoft.
• Download Sample Database from here.
• Extract the zip file -> Open *.sql file and run it to create new
database with sample tables and data.
Create Sample table in Amazon
Redshift
4. DOUBLE CLICK ON THE TASK TO
SEE UI.
5.CLICK ON [NEW] CONNECTION.
6. CONFIGURE REDSHIFT
CONNECTION PROPERTIES AND
CLICK TEST.
Create Sample table in Amazon
Redshift
7. TEST CONNECTION IS SUCCESSFUL
THEN CLICK OK TO SAVE
CONNECTION DETAIL.
8. ENTER FOLLOWING SCRIPT IN THE
SQL TEXTBOX AND HIT OK TO SAVE
IT.
Create Sample table in Amazon
Redshift
9. NOW RIGHT CLICK ON THE TASK
AND EXECUTE. THIS SHOULD CREATE
NEW TABLE IN REDSHIFT.
SQL Server to Redshift Data Load using SSIS
Once table is created now lets do real work to get data moving from SQL Server to Amazon Redshift.
Perform the following steps to configure SSISAmazon Redshift DataTransferTask
1. Drag Amazon Redshift DataTransferTask on the SSIS designer surface.
2. Double click on the task to edit properties.
3. Select Action: In the top Action drop down select Bulk Import to Redshift from any RDBMS (e.g.
MySQL, Oracle, SQL Server) option
4. Configure Source: On the Source tab click [New] next to connection dropdown and configure Source
connection or pick existing connection. In our case we are extracting data from SQl Server database
(Northwind) on local server.
Enter the following SQL Query to extract 100,000 rows from SQL Server
Create Sample table in Amazon
Redshift
5. CONFIGURE SOURCE STAGING
AREA: ON THE SOURCE TAB YOU
HAVE TO ENTER FOLDER LOCATION
WHERE STAGING FILES WILL BE
SAVED BEFORE WE UPLOAD TO
REDSHIFT (SEE ABOVE SCREEN).
Create Sample table in Amazon Redshift
6. CONFIGURE TARGET: ON TARGET
TAB SELECT EXISTING REDSHIFT
CONNECTION MANAGER (OR CREATE
NEW), SELECT TARGET TABLE FROM
THE DROPDOWN WHERE YOU WANT
TO LOAD DATA. IF YOU HAVE LONG
LIST OF TABLES THEN SIMPLY ENTER
SCHEMA NAME IN THE SCHEMA
FILTER TEXT BOX AND CLICK
REFRESH TO RELOAD TABLE
DROPDOWN WITH FEWER ITEMS.
Create Sample table in Amazon Redshift
7. CONFIGURE RELOAD OPTION AND
TARGET STAGING AREA: ON TARGET
TAB CHECK TRUNCATE TARGET
TABLE OPTION IF YOU WANT TO
RELOAD EACH TIME EXECUTE THIS
TASK ELSE LEAVE IT UNCHECKED TO
APPEND RECORDS. WE ALSO HAVE
TO SPECIFY AMAZON S3 STAGING
AREAS WHERE REDSHIFT WILL LOOK
FOR FILES TO LOAD.
Create Sample table in Amazon Redshift
8. CONFIGURE FILE FORMAT: WE
ARE GOING TO GENERATE CSV FILES
FOR REDSHIFT LOAD SO MAKE SURE
YOU SELECT CORRECT COLUMN
DELIMITER. ALSO MAKE SURE YOU
CHECK ALWAYS COMPRESS FILE
OPTION TO REDUCE BANDWIDTH.
Create Sample table in Amazon Redshift
9. CONFIGURE ARCHIVE OPTIONS:
ON ARCHIVE TAB WE CAN SPECIFY
HOW TO ARCHIVE SOURCE AND
TARGET FILES WE GENERATED.
SOURCE FILES ARE CSV FILES AND
SOURCE STAGE FILES ARE *.GZ
FILES (IF YOU SELECT
COMPRESSION). TARGET STAGE
FILES ARE EITHER CSV OR *.GZ
FILES.BY DEFAULT SOURCE CSV
FILES ARE KEPT AND ALL OTHER
STAGE FILES ARE DELETED. SEE
BELOW SCREENSHOT
Create Sample table in Amazon Redshift
10. CONFIGURE ADVANCED
OPTIONS: ON ADVANCED OPTIONS
TAB YOU FINE TUNE LOAD PROCESS
SUCH AS HOW TO HANDLE NULL
DATA, HOW TO HANDLE DATA
TRUNCATION ETC. READ HELP FILE
FOR MORE INFO
Create Sample table in Amazon Redshift
11. CONFIGURE ERROR HANDLING
OPTIONS: ON ERROR HANDLING TAB
YOU CAN SPECIFY HOW MANY
ERRORS YOU WANT TO IGNORE
BEFORE FAILING ENTIRE LOAD. YOU
CAN ALSO REPLACE SOME INVALID
CHARACTERS DURING YOUR IF YOU
CHECK [ALLOW INVALID
CHARACTERS] OPTION.
Create Sample table in Amazon Redshift
12. NOW FINALLY WE READY TO EXECUTE OUR SSIS PACKAGE. ONCE ITS
DONE YOU CAN REVIEW LOG. HERE IS THE SAMPLE EXECUTION LOG .
Conclusion
So in this article we outlined different steps needed to load data into Redshift from relational source (e.g.
MySQL, SQL Server, Oracle). Redshift is a great way to offload your expensive data warehouse to cloud so
you don’t have to worry about costly maintenance and future growth.With redshift you can grow your data
size from Gigabyte to Petabyte. SSISAmazon Redshift DataTransferTask. can give you an easy way to
maintain your Redshift data transfer process with ease of use and fast load options (for full or incremental
load).
Again this was just proof of concept but we encourage you to do your own benchmarking and research see
which approach suites best for your need.
• Related Links:
• SSIS Amazon Redshift DataTransferTask
TAGS: amazon redshift Amazon Redshift Data Transfer Task aws command line csv excel export How-To json mysql PDF Redshift SSIS SSIS PowerPack

More Related Content

PPT
1 samuel 12 Whom Will You Serve?
PPTX
Book of Revelation : Overview
PPTX
Migration to Redshift from SQL Server
PDF
Começando com Amazon Redshift
PDF
Como o Magazine Luiza inova suas operações utilizando as soluções de IoT e Bi...
PPTX
Redshift overview
PDF
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介
PDF
London Redshift Meetup - July 2017
1 samuel 12 Whom Will You Serve?
Book of Revelation : Overview
Migration to Redshift from SQL Server
Começando com Amazon Redshift
Como o Magazine Luiza inova suas operações utilizando as soluções de IoT e Bi...
Redshift overview
[よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介
London Redshift Meetup - July 2017

Similar to SQL Server to Redshift Data Load Using SSIS (20)

PDF
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
PDF
Melhores práticas de data warehouse no Amazon Redshift
PDF
Amazon-Redshift-dBT-Best-Practices_paper.pdf
PPTX
Redshift Introduction
PDF
2017 AWS DB Day | Amazon Redshift 자세히 살펴보기
PDF
Amazon Redshift For Data Analysts
PPTX
BigData: AWS RedShift with S3, EC2
PPTX
REDSHIFT - Amazon
PDF
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
PDF
Aws Data Engineer Training | Aws Data Engineer Course
PDF
Introduction to Amazon Redshift
PDF
Step-by-Step Guide to Migrate Netezza to Redshift Cloud.pdf
PDF
Redshift deep dive
PPTX
What is Amazon Redshift?
PDF
Amazon RedShift - Ianni Vamvadelis
PDF
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013
PPTX
July 2017 Meeting of the Denver AWS Users' Group
PDF
AWS Innovate: Running Databases in AWS- Russell Nash
PPTX
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
PDF
Aws summit 2014 redshift
[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介
Melhores práticas de data warehouse no Amazon Redshift
Amazon-Redshift-dBT-Best-Practices_paper.pdf
Redshift Introduction
2017 AWS DB Day | Amazon Redshift 자세히 살펴보기
Amazon Redshift For Data Analysts
BigData: AWS RedShift with S3, EC2
REDSHIFT - Amazon
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
Aws Data Engineer Training | Aws Data Engineer Course
Introduction to Amazon Redshift
Step-by-Step Guide to Migrate Netezza to Redshift Cloud.pdf
Redshift deep dive
What is Amazon Redshift?
Amazon RedShift - Ianni Vamvadelis
Amazon Redshift - Bay Area CloudSearch Meetup June 19, 2013
July 2017 Meeting of the Denver AWS Users' Group
AWS Innovate: Running Databases in AWS- Russell Nash
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Aws summit 2014 redshift
Ad

Recently uploaded (20)

PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
AI in Product Development-omnex systems
PPTX
Introduction to Artificial Intelligence
PDF
Nekopoi APK 2025 free lastest update
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Internet Downloader Manager (IDM) Crack 6.42 Build 41
CHAPTER 2 - PM Management and IT Context
Design an Analysis of Algorithms I-SECS-1021-03
Odoo Companies in India – Driving Business Transformation.pdf
Upgrade and Innovation Strategies for SAP ERP Customers
Wondershare Filmora 15 Crack With Activation Key [2025
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Operating system designcfffgfgggggggvggggggggg
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Odoo POS Development Services by CandidRoot Solutions
Adobe Illustrator 28.6 Crack My Vision of Vector Design
How to Migrate SBCGlobal Email to Yahoo Easily
wealthsignaloriginal-com-DS-text-... (1).pdf
AI in Product Development-omnex systems
Introduction to Artificial Intelligence
Nekopoi APK 2025 free lastest update
Ad

SQL Server to Redshift Data Load Using SSIS

  • 1. SQL Server To Redshift Data Load Using SSIS Reach for the Clouds, Inc. Next Generation SSIS Tasks and Connectors Series AUTHOR: NAYAN PATEL | SR. ETL SSIS ARCHITECT N PAT E L @ R F TC LO U D S . C O M
  • 2. Content • Introduction – SQL Server to Redshift Load • VideoTutorial – Redshift Data Load • Right way but hard way • Steps for Amazon Redshift Data Load from On-Premise files or RDBMS (e.g. MySQL, SQL Server) • Doing it easy way • Should I use SSIS to load Redshift • Setup your Amazon Redshift Cluster • Add inbound rule for Redshift Cluster • Automate Redshift Cluster Creation • Create Sample table and data in Source – (in this example SQL Server) • Create Sample table in Amazon Redshift • SQL Server to Redshift Data Load using SSIS • Conclusion - Related Links
  • 3. Introduction – SQL Server to Redshift Load • Before we talk data load from SQL Server to Redshift using SSIS lets talk what is Amazon Redshift (or sometimes referred to as AWS Redshift). Amazon Redshift is a Cloud based Data warehouse service. This type of system also referred as MPP (Massively Parallel Processing). Amazon Redshift uses highly modified version of PostGrey SQL Engine behind the scene. Amazon Redshift provides advantage of Scale as you go, at very low cost compared to onsite dedicated hardware/software approach.
  • 4. Right way but hard way • If you are reading some of the guidelines published by Amazon regarding Redshift Data load then you will quickly realize that there is a lot to do under the cover to get it going right way. Here are few steps you will have to perform while loading data to Redshift from your On-Premise server (Data can be sitting in files or Relational source).
  • 5. Right way but hard way Steps for Amazon Redshift Data Load from On-Premise files or RDBMS (e.g. MySQL, SQL Server) • Export local RDBMS data to flat files (Make sure you remove invalid characters, apply escape sequence during export) • Split files into 10-15 MB each to get optimal performance during upload and final Data load • Compress files to *.gz format so you don’t end up with $1000 surprise bill :) .. In my caseText files were compressed 10-20 times • List all file names to manifest file so when you issue COPY command to Redshift its treated as one unit of load • Upload manifest file to Amazon S3 bucket • Upload local *.gz files to Amazon S3 bucket • Issue RedshiftCOPY command with different options • Schedule file archiving from on-premises and S3 Staging area on AWS • Capturing Errors, setting up restart ability if something fails
  • 6. Doing it the easy way • So if you are not sure you ready to code many steps listed above then you can use Amazon Redshift DataTransferTask. • In next few sections we will describe how to setup your Redshift Cluster for Demo purpose and load Data from SQL Server to Redshift using SSIS.
  • 7. Doing it the easy way Should I use SSIS to load Redshift • If you are curious which approach to use to load data then consider few facts • Do you have existing ETL processes written in SSIS? • Do you need more visual approach and better work flow management (what SSIS Provides)? • Do you need connection string encryption and other goodies offered by SSIS such as native logging, passing parameters from SSIS environment • Do you have expertise available for SSIS in-house or you better stay with command line scripts? • Do you have need to create workflow which can run on any server where SSIS is not installed?
  • 8. Setup your Amazon Redshift Cluster NOTE: SKIP THIS STEP IF YOU ALREADY SETUP YOU REDSHIFT CLUSTER 1.LOGIN TO YOUR AWS CONSOLE AND CLICK ON REDSHIFT ICON. OR CLICK HERE TO LAND DIRECTLY TO REDSHIFT 2.CLICK ON LAUNCH CLUSTER 3.ON CLUSTER DETAIL PAGE SPECIFY CLUSTER IDENTIFIER, DATABASE NAME, PORT, MASTER USER AND PASSWORD. CLICK CONTINUE TO GO TO NEXT PAGE
  • 9. Setup your Amazon Redshift Cluster 4. ON NODE CONFIGURATION PAGE SPECIFY NODE TYPE (THIS IS VM TYPE), CLUSTER TYPE AND NUMBER OF NODE. IF YOU ARE TRYING UNDER FREE TIRE THEN SELECT SMALLEST NODE POSSIBLE (IN THIS CASE IT WAS DW2.LARGE). CLICK CONTINUE TO GO TO NEXT PAGE
  • 10. Setup your Amazon Redshift Cluster 5. ON ADDITIONAL CONFIGURATION PAGE YOU CAN PICK VPC (VIRTUAL PRIVATE CONNECTION), SECURITY GROUP FOR CLUSTER AND OTHER OPTIONS FOR ENCRYPTION. FOR DEMO PURPOSE SELECT AS BELOW SCREENSHOT . CLICK CONTINUE TO REVIEW YOUR SETTINGS AND CLICK CREATE CLUSTER
  • 11. Setup your Amazon Redshift Cluster 6. GIVE IT FEW MINS WHILE YOUR CLUSTER IS BEING CREATED. AFTER FEW MINUTES (5-10 MINS) YOU CAN GO BACK TO SAME PAGE AND REVIEW CLUSTER STATUS AND OTHER PROPERTIES AS BELOW. COPY CLUSTER ENDPOINT TO SOMEWHERE BECAUSE WE WILL NEED IT LATER.
  • 12. Add inbound rule for Redshift Cluster NOTE: SKIP THIS STEP IF YOU HAVE ALREADY ADDED YOUR IP TO INBOUND EXCLUSION RULE. BY DEFAULT YOU CANNOT CONNECT TO AMAZON REDSHIFT CLUSTER FROM OUTSIDE AWS NETWORK (E.G. FROM YOUR ON -PREMISES MACHINE). IF YOU WISH TO CONNECT THEN YOU MUST ADD INBOUND EXCEPTION RULE TO ALLOW YOUR REQUEST TO REDSHIFT CLUSTER ON SPECIFIC PORT. TO ADD CREATE NEW INBOUND RULE PERFORM FOLLOWING STEPS 1. UNDER REDSHIFT HOME PAGE CLICK [SECURITY] TAB. YOU MAY SEE FOLLOWING NOTICE DEPENDING ON WHICH REGION YOU ARE. CLICK ON [GO TO THE EC2 CONSOLE] LINK OR YOU CAN DIRECT GO TO EC2 BY CLICKING SERVICES -> EC2 MENU AT THE TOP
  • 13. Add inbound rule for Redshift Cluster 2. ON EC2 SECURITY GROUPS PAGE SELECT SECURITY GROUP ATTACHED WITH YOUR REDSHIFT CLUSTER AND THEN IN THE BOTTOM PANE CLICK ON INBOUND TAB 3. ON INBOUND TAB CLICK EDIT OPTION TO MODIFY DEFAULT ENTRY OR YOU CAN ADD NEW RULE 4. CLICK ON ADD RULE IF YOU WISH TO ADD NEW ENTRY ELSE EDIT AS BELOW AND CLICK SAVE
  • 14. Automate Redshift Cluster Creation If you have need to automate Redshift Cluster Creation or any of the following things automatically then check Redshift Cluster managementTask • Automate Amazon Redshift Cluster Create Action in few clicks.You can also add Access Security Rule. • Automate Amazon Redshift Cluster Delete Action • Fetch Amazon Redshift Cluster Property to SSISVariable (e.g. Fetch Cluster Status) • Fetch all cluster and their properties as DataTable (Use ForEach Loop and iterate through all clusters) • Automate Redshift Cluster Snapshot Creation • Automate Redshift Cluster Snapshot Delete Action • Support forWait until Cluster operation is done
  • 15. Create Sample table and data in Source – (in this example SQL Server) Note: Skip this step if you wish to use your own table. If you do so please ignore certain steps and screenshots mentioned in this article. For this demo we will use Free Northwind sample database supplied by Microsoft. • Download Sample Database from here. • Extract the zip file -> Open *.sql file and run it to create new database with sample tables and data.
  • 16. Create Sample table in Amazon Redshift 4. DOUBLE CLICK ON THE TASK TO SEE UI. 5.CLICK ON [NEW] CONNECTION. 6. CONFIGURE REDSHIFT CONNECTION PROPERTIES AND CLICK TEST.
  • 17. Create Sample table in Amazon Redshift 7. TEST CONNECTION IS SUCCESSFUL THEN CLICK OK TO SAVE CONNECTION DETAIL. 8. ENTER FOLLOWING SCRIPT IN THE SQL TEXTBOX AND HIT OK TO SAVE IT.
  • 18. Create Sample table in Amazon Redshift 9. NOW RIGHT CLICK ON THE TASK AND EXECUTE. THIS SHOULD CREATE NEW TABLE IN REDSHIFT.
  • 19. SQL Server to Redshift Data Load using SSIS Once table is created now lets do real work to get data moving from SQL Server to Amazon Redshift. Perform the following steps to configure SSISAmazon Redshift DataTransferTask 1. Drag Amazon Redshift DataTransferTask on the SSIS designer surface. 2. Double click on the task to edit properties. 3. Select Action: In the top Action drop down select Bulk Import to Redshift from any RDBMS (e.g. MySQL, Oracle, SQL Server) option 4. Configure Source: On the Source tab click [New] next to connection dropdown and configure Source connection or pick existing connection. In our case we are extracting data from SQl Server database (Northwind) on local server. Enter the following SQL Query to extract 100,000 rows from SQL Server
  • 20. Create Sample table in Amazon Redshift 5. CONFIGURE SOURCE STAGING AREA: ON THE SOURCE TAB YOU HAVE TO ENTER FOLDER LOCATION WHERE STAGING FILES WILL BE SAVED BEFORE WE UPLOAD TO REDSHIFT (SEE ABOVE SCREEN).
  • 21. Create Sample table in Amazon Redshift 6. CONFIGURE TARGET: ON TARGET TAB SELECT EXISTING REDSHIFT CONNECTION MANAGER (OR CREATE NEW), SELECT TARGET TABLE FROM THE DROPDOWN WHERE YOU WANT TO LOAD DATA. IF YOU HAVE LONG LIST OF TABLES THEN SIMPLY ENTER SCHEMA NAME IN THE SCHEMA FILTER TEXT BOX AND CLICK REFRESH TO RELOAD TABLE DROPDOWN WITH FEWER ITEMS.
  • 22. Create Sample table in Amazon Redshift 7. CONFIGURE RELOAD OPTION AND TARGET STAGING AREA: ON TARGET TAB CHECK TRUNCATE TARGET TABLE OPTION IF YOU WANT TO RELOAD EACH TIME EXECUTE THIS TASK ELSE LEAVE IT UNCHECKED TO APPEND RECORDS. WE ALSO HAVE TO SPECIFY AMAZON S3 STAGING AREAS WHERE REDSHIFT WILL LOOK FOR FILES TO LOAD.
  • 23. Create Sample table in Amazon Redshift 8. CONFIGURE FILE FORMAT: WE ARE GOING TO GENERATE CSV FILES FOR REDSHIFT LOAD SO MAKE SURE YOU SELECT CORRECT COLUMN DELIMITER. ALSO MAKE SURE YOU CHECK ALWAYS COMPRESS FILE OPTION TO REDUCE BANDWIDTH.
  • 24. Create Sample table in Amazon Redshift 9. CONFIGURE ARCHIVE OPTIONS: ON ARCHIVE TAB WE CAN SPECIFY HOW TO ARCHIVE SOURCE AND TARGET FILES WE GENERATED. SOURCE FILES ARE CSV FILES AND SOURCE STAGE FILES ARE *.GZ FILES (IF YOU SELECT COMPRESSION). TARGET STAGE FILES ARE EITHER CSV OR *.GZ FILES.BY DEFAULT SOURCE CSV FILES ARE KEPT AND ALL OTHER STAGE FILES ARE DELETED. SEE BELOW SCREENSHOT
  • 25. Create Sample table in Amazon Redshift 10. CONFIGURE ADVANCED OPTIONS: ON ADVANCED OPTIONS TAB YOU FINE TUNE LOAD PROCESS SUCH AS HOW TO HANDLE NULL DATA, HOW TO HANDLE DATA TRUNCATION ETC. READ HELP FILE FOR MORE INFO
  • 26. Create Sample table in Amazon Redshift 11. CONFIGURE ERROR HANDLING OPTIONS: ON ERROR HANDLING TAB YOU CAN SPECIFY HOW MANY ERRORS YOU WANT TO IGNORE BEFORE FAILING ENTIRE LOAD. YOU CAN ALSO REPLACE SOME INVALID CHARACTERS DURING YOUR IF YOU CHECK [ALLOW INVALID CHARACTERS] OPTION.
  • 27. Create Sample table in Amazon Redshift 12. NOW FINALLY WE READY TO EXECUTE OUR SSIS PACKAGE. ONCE ITS DONE YOU CAN REVIEW LOG. HERE IS THE SAMPLE EXECUTION LOG .
  • 28. Conclusion So in this article we outlined different steps needed to load data into Redshift from relational source (e.g. MySQL, SQL Server, Oracle). Redshift is a great way to offload your expensive data warehouse to cloud so you don’t have to worry about costly maintenance and future growth.With redshift you can grow your data size from Gigabyte to Petabyte. SSISAmazon Redshift DataTransferTask. can give you an easy way to maintain your Redshift data transfer process with ease of use and fast load options (for full or incremental load). Again this was just proof of concept but we encourage you to do your own benchmarking and research see which approach suites best for your need. • Related Links: • SSIS Amazon Redshift DataTransferTask TAGS: amazon redshift Amazon Redshift Data Transfer Task aws command line csv excel export How-To json mysql PDF Redshift SSIS SSIS PowerPack