SlideShare a Scribd company logo
© 2016 IDERA, Inc. All rights reserved.
Proprietary and confidential.
© 2018 IDERA, Inc. All rights reserved.
THE EVER GROWING SCIENCE OF
DATABASE MIGRATIONS
Presented by Bert Scalzo, PhD, MBA & Oracle ACE
May 16th, 2018
© 2018 IDERA, Inc. All rights reserved.
BOOKS BY AUTHOR
Fall 2017
© 2018 IDERA, Inc. All rights reserved. 3
SESSION DESCRIPTION
Many information technology professionals may not recognize it, but the bulk of
their work has been and continues to be nothing more than database
migrations. In the old days to share files across systems, then to move files into
relational databases, then to load into data warehouses, and finally now we're
moving to NoSQL and the cloud.
In the presentation we'll delve into the ever growing and increasingly complex
world of database migrations. Some of these considerations include:
▪ What issues must be planned for and overcome
▪ What problems are likely to occur
▪ What types of tools exist
© 2018 IDERA, Inc. All rights reserved.
BACKGROUND & TARGET AUDIENCE
▪ Over 30 years I’ve spent time working at both commercial and government
institutions in various roles from entry level QA tester all the way up to up to
Director of IT
▪ Toughest role has been as lead architect/DBA for mission critical production
systems with extremely strict support responsibilities
▪ I’ve also spent time working for various software development companies
where programmers who don’t have production support responsibilities but
nonetheless build tools for those people who support mission critical systems
▪ This topic will concentrate on traditional business application developers
© 2018 IDERA, Inc. All rights reserved.
HOW MUCH TIME CODING
I’ve seen many numbers
quoted, but the common
belief seems to be roughly
25% of the time
© 2018 IDERA, Inc. All rights reserved.
OF THAT 25% - MOST IS SIMPLY MOVING DATA
▪ Several studies have put forward that most IT professionals spend the
vast majority of their time simply moving data (as high as 80% - 85%)
▪ My 30 years experience finds no possible reason to doubt those findings
▪ Thus we might be actually better off being paid by the gigabytes of data
that we move rather than salary 
It does make me wonder why I had to learn
so many advanced data types and complex
algorithms when much of my work has been
as a simple data mule …
“
© 2018 IDERA, Inc. All rights reserved. 7
Sisyphus was punished for his self-
aggrandizing craftiness and deceitfulness by
being forced to roll an immense boulder up a
hill only for it to roll down when it nears the top,
repeating this action for eternity
“
© 2018 IDERA, Inc. All rights reserved. 8
Programmers are punished for their self-
aggrandizing craftiness and deceitfulness
by being forced to move immense amounts
of data from source to target, repeating this
action for eternity
© 2018 IDERA, Inc. All rights reserved.
THE STATES OF MATTER
▪ Remember that data is any company’s most
important asset, so working with that asset is
both natural and to be expected
▪ Data like matter has states, it’s either bound
(i.e. at rest) or liberated (i.e. in motion)
▪ DBAs handle data at rest while programmers
put data in motion via apps or data migrations
▪ Migrating data may not be sexy, but it’s what
application developers are asked to do most
of the time (i.e. I need a copy of that …)
Liberated
Bound
© 2018 IDERA, Inc. All rights reserved.
EXAMPLES OF TRADITIONAL DATA MIGRATION
▪ Traditional OLTP “data feeds” from one system to another
▪ Population of data warehouse directly from source systems (ETL)
▪ Population of a centralized Operational Data Store (ODS)
▪ Population of data warehouse (DW) indirectly from central ODS
▪ Population of data marts from either source systems, ODS or DW
▪ Data extraction from source systems into cubes for business analysis
▪ Data extraction for delivery to business users (e.g. CSV to import Excel)
© 2018 IDERA, Inc. All rights reserved.
THE MODERN WORLD OF DATA MIGRATIONS
▪ Data engineers construct massive big data reservoirs sometimes referred
to as “data pools” for investigation by both data scientists & data analysts
(often NoSQL) as businesses embrace data analytics & data mining
▪ Consolidating databases to save on licensing costs
▪ Moving from expensive database platform to an open source alternative
▪ Moving portions of databases into the cloud
▪ Moving entire databases into the cloud
▪ Combos of all the above
“
© 2018 IDERA, Inc. All rights reserved. 12
Tantalus was made to stand in a pool of
water beneath a fruit tree with low
branches, with the fruit ever eluding his
grasp, and the water always receding
before he could take a drink.
“
© 2018 IDERA, Inc. All rights reserved. 13
Analysts are made to stand in a pool of
data beneath the tree of knowledge with
low branches, with the facts ever eluding
their grasp, and information always
receding before they can claim success.
© 2018 IDERA, Inc. All rights reserved.
THE EVER GROWING DATA DEMAND
▪ Businesses are addicted to information since they see it as an edge
▪ Technology improvements have lowered costs to keep historical data
▪ Data mining, data analytics and data science all added fuel to this fire
▪ The cloud makes all this quicker and cheaper to deploy …
© 2018 IDERA, Inc. All rights reserved.
© 2018 IDERA, Inc. All rights reserved.
THE NATURE OF DATA GROWTH
© 2018 IDERA, Inc. All rights reserved.
That’s 180 billion terabytes of data!!!
“
© 2018 IDERA, Inc. All rights reserved. 18
I feel self-doubt whether I'm doing
something hard or easy.
Sigourney Weaver
“
© 2018 IDERA, Inc. All rights reserved. 19
Drones can be useful tools, and I am all
about useful tools. One of my mottos is
'the right tool for the right job.'
Martha Stewart
© 2018 IDERA, Inc. All rights reserved.
CUSTOM CODE
▪ If source and target same then have programmers write custom code in
database language (e.g. T-SQL or PL/SQL) to move data
▪ If source and target different then have programmers write custom code in
3GL or 4GL scripting language (e.g. PowerShell or PERL) to move data
▪ Even if you know the systems and the data, the time required can be
prohibitive but management often makes the mistake that this will be
cheaper than buying tools
▪ Performance often poor since multithreaded, parallelized code generally
not easy to write nor debug (and few are good at it anyhow)
© 2018 IDERA, Inc. All rights reserved.
EXPORT / IMPORT – DATA (WILL SHOW IN DEMO)
▪ If source and target same then often options exist to use export/import
(with proprietary file format) or backup/restore as a data movement tool
▪ If source and target different then sometimes export offers the ability to
dump data either as SQL statements or as raw data (e.g. CSV file) but
then target ideally should offer high speed data loader (e.g. SQL/Loader)
▪ Sometimes these tools offer parallel capabilities but make sure to know
the caveats such as using more space due to “holes” in the data from
parallel inserts not filling entire pages / blocks
▪ High disk space cost since must have three copies of data existing at the
same time (i.e. source, target and exported files)
© 2018 IDERA, Inc. All rights reserved.
EXPORT / IMPORT – SQL (WILL SHOW IN DEMO)
▪ Often viewed as a “safe & easy” choice since SQL INSERT command is
generally fairly uniform across databases with few proprietary additions
▪ Need one of two capabilities: generated SQL INSERT contains COMMIT
every N statements or a SQL command line tool with auto commit option
▪ Generally neither parallelized nor multithreaded (i.e. slow option)
▪ Difficulties with highly complex data type conversion rules for date, time,
datetime and single/double quoted strings needing escaped
▪ High disk space cost since must have three copies of data existing at the
same time (i.e. source, target and exported files)
© 2018 IDERA, Inc. All rights reserved.
EXTRACT TRANSFORM LOAD (ETL)
▪ ETL tools are quite popular and there are many open source tools (free)
▪ ETL tools were designed for and are great at performing highly complex
transformations, however simple tasks are often made more tedious by
the overhead of such a robust and powerful design
▪ ETL tools are HUGE, often a couple gigabytes for just the client GUI and
also requiring deployment of a server agent on “close vicinity” servers
▪ ETL tool learning curves are significant and unless used daily often must
be relearned when needed
▪ ETL tools may not best leverage database platform specific capabilities
© 2018 IDERA, Inc. All rights reserved.
BASIC DATA MOVERS (WILL SHOW IN DEMO)
▪ Good choice when all you need to do is to move data from point A to B
▪ Often offer good parallel and multithreaded capabilities for speed
▪ Handles all the highly complex data type conversion rules for date, time,
datetime and single/double quoted strings needing escaped
▪ Client based tools only good for relatively small databases (PC CPU and
memory plus network bottlenecks)
▪ Some tools offer ability for client to run on server or an agent on server
▪ There are some free tools but they generally require paying to scale up
© 2018 IDERA, Inc. All rights reserved.
REPLICATION BASED
▪ This is a somewhat newer and novel approach that is gaining traction
▪ Replication tools offering cross database support can be used for either
synchronous or asynchronous replication as the data movement engine
▪ Often offer excellent parallel and multithreaded capabilities for speed
▪ Handles all the highly complex data type conversion rules for date, time,
datetime and single/double quoted strings needing escaped
▪ Generally handled by privileged users like DBAs, not app developers
▪ Solution can be quite expensive unless you already have replication tool
© 2018 IDERA, Inc. All rights reserved.
CLOUD SOLUTION – DATABASE VENDOR – SQL SERVER
▪ SQL Server 2016 introduced the “stretch database” concept (to Azure)
▪ Relatively simple syntax to define rules for what portions of the data are
placed in the cloud (note that this is not just one time but rather ongoing)
▪ SQL Server query optimizer makes this data split totally transparent to all
applications (so cloud adoption can easily be rolled out incrementally)
▪ There are some limitations (e.g. data types, index types, table types, etc.)
▪ Can be expensive: space + compute + DSU (a database stretch unit
represents the power of the query and is quantified by your workload
objectives: how fast rows are written, read and computed against)
© 2018 IDERA, Inc. All rights reserved.
CLOUD SOLUTION – DATABASE VENDOR – ORACLE
▪ Oracle 12c R2 multitenant offers “relocate PDB” concept (to Oracle Cloud)
▪ Can be done online!!!
▪ Simple drag and drop operation to perform!!!
▪ Bidirectional – so can incrementally deploy or revert as needed
▪ May perform quicker than one would imagine – sometimes fastest option
▪ Require source database is a PDB (i.e. pluggable database container)
▪ Cost – requires enterprise edition with multitenant option (next page)
© 2018 IDERA, Inc. All rights reserved.
© 2018 IDERA, Inc. All rights reserved.
CLOUD VENDOR SOLUTION – EXAMPLE AWS
▪ Import/Export service (mail your portable storage devices to Amazon)
▪ AWS Snowball Appliance (create snowball job, AWS sends portable
storage device, run snowball client to copy the data, ship back the
portable storage device to AWS, data automatically copied to your S3)
▪ AWS Glue (ETL tool for migrating cloud to cloud)
▪ AWS Kinesis Data Firehose (easiest way to load streaming loud data into
cloud data stores and cloud analytics tools)
▪ There are more and Amazon is constantly creating new offerings …
© 2018 IDERA, Inc. All rights reserved.
CLOUD VENDOR SOLUTION – EXAMPLE AZURE
▪ Import/Export service (mail your portable storage devices to Microsoft)
▪ Azure Data Factory (fully managed cloud-based data integration service)
▪ Azure Polybase (fastest possible loading of Azure SQL Data Warehouse
leveraging the entire Massively Parallel Processing (MPP) architecture)
• PolyBase with T-SQL
• PolyBase with SSIS
• PolyBase with Azure Data Factory (ADF)
• PolyBase with Azure DataBricks
▪ There are more and Microsoft is constantly creating new offerings …
© 2018 IDERA, Inc. All rights reserved.
DEMO
© 2016 IDERA, Inc. All rights reserved. Proprietary and confidential.© 2018 IDERA, Inc. All rights reserved. 32
THANKS!
Any questions?
You can find me at:
bertscalzo2@gmail.com

More Related Content

PPTX
Big Data & Oracle Technologies
PDF
Next Generation Hadoop Introduction
PPTX
PDF
Paytm labs soyouwanttodatascience
PDF
Introduction to Big Data Technologies & Applications
PPTX
Introduction To Big Data & Hadoop
PDF
Data lake
PPTX
Understanding Big Data for policy professionals
Big Data & Oracle Technologies
Next Generation Hadoop Introduction
Paytm labs soyouwanttodatascience
Introduction to Big Data Technologies & Applications
Introduction To Big Data & Hadoop
Data lake
Understanding Big Data for policy professionals

What's hot (20)

PPT
Big Data: An Overview
PDF
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
PDF
Big Data at Oracle - Strata 2015 San Jose
PPTX
Introduction of Big data, NoSQL & Hadoop
PDF
Summary introduction to data engineering
PDF
Designing the Next Generation Data Lake
PDF
Rob peglar introduction_analytics _big data_hadoop
PPTX
It's not the size of your cluster, it's how you use it
PPTX
Big Data Platforms: An Overview
PDF
Hadoop and the Data Warehouse: When to Use Which
PPTX
Inside open metadata—the deep dive
PDF
Hadoop,Big Data Analytics and More
PDF
Journey to Big Data: Main Issues, Solutions, Benefits
PDF
ROI of Big Data Analytics Native on Hadoop
PPSX
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
PDF
Big data trends challenges opportunities
PPTX
Spark and Hadoop Perfect Togeher by Arun Murthy
PDF
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
PDF
The Ecosystem is too damn big
PDF
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
Big Data: An Overview
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Big Data at Oracle - Strata 2015 San Jose
Introduction of Big data, NoSQL & Hadoop
Summary introduction to data engineering
Designing the Next Generation Data Lake
Rob peglar introduction_analytics _big data_hadoop
It's not the size of your cluster, it's how you use it
Big Data Platforms: An Overview
Hadoop and the Data Warehouse: When to Use Which
Inside open metadata—the deep dive
Hadoop,Big Data Analytics and More
Journey to Big Data: Main Issues, Solutions, Benefits
ROI of Big Data Analytics Native on Hadoop
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
Big data trends challenges opportunities
Spark and Hadoop Perfect Togeher by Arun Murthy
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
The Ecosystem is too damn big
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
Ad

Similar to IDERA Live | The Ever Growing Science of Database Migrations (20)

PDF
Idera live 2021: Managing Databases in the Cloud - the First Step, a Succes...
PDF
Solve User Problems: Data Architecture for Humans
PDF
Slides: How Automating Data Lineage Improves BI Performance
PPTX
IDERA Live | Databases Don't Build and Populate Themselves
PDF
The Great Data Migration, Dealing With Cybersecurity and Privacy in Legacy Da...
PDF
Ten Ways For DBA's To Save Time
PDF
Ten Ways For DBA's To Save Time
PDF
A Deep Dive into NetSuite Data Migration.pdf
DOCX
Data Migration_ Process, Risks and Differences.docx
DOCX
The Ultimate Guide to Data Migration Strategies, Tools, and Techniques.docx
PDF
Industry - Testing & Quality Assurance in Data Migration Projects
PDF
Taming the data beast
PPTX
[DSC Europe 24] Marcin Szymaniuk - The path to Effective Data Migration - Ove...
PPTX
Day 1 - Technical Bootcamp azure synapse analytics
PPTX
Dynamics 365 saturday 2018 - data migration story
PPTX
Big Data Platform and Architecture Recommendation
ODP
Python for Data Logistics
PPTX
5 Key Data Migration Strategies Used by Experts
PPTX
Moving the Elephant in the Room: Data Migration at Scale
PDF
The Shifting Landscape of Data Integration
Idera live 2021: Managing Databases in the Cloud - the First Step, a Succes...
Solve User Problems: Data Architecture for Humans
Slides: How Automating Data Lineage Improves BI Performance
IDERA Live | Databases Don't Build and Populate Themselves
The Great Data Migration, Dealing With Cybersecurity and Privacy in Legacy Da...
Ten Ways For DBA's To Save Time
Ten Ways For DBA's To Save Time
A Deep Dive into NetSuite Data Migration.pdf
Data Migration_ Process, Risks and Differences.docx
The Ultimate Guide to Data Migration Strategies, Tools, and Techniques.docx
Industry - Testing & Quality Assurance in Data Migration Projects
Taming the data beast
[DSC Europe 24] Marcin Szymaniuk - The path to Effective Data Migration - Ove...
Day 1 - Technical Bootcamp azure synapse analytics
Dynamics 365 saturday 2018 - data migration story
Big Data Platform and Architecture Recommendation
Python for Data Logistics
5 Key Data Migration Strategies Used by Experts
Moving the Elephant in the Room: Data Migration at Scale
The Shifting Landscape of Data Integration
Ad

More from IDERA Software (20)

PPTX
The role of the database administrator (DBA) in 2020: Changes, challenges, an...
PPTX
Problems and solutions for migrating databases to the cloud
PPTX
Public cloud uses and limitations
PPTX
Optimize the performance, cost, and value of databases.pptx
PPTX
Monitor cloud database with SQL Diagnostic Manager for SQL Server
PPTX
Database administrators (dbas) face increasing pressure to monitor databases
PPTX
Six tips for cutting sql server licensing costs
PDF
Idera live 2021: The Power of Abstraction by Steve Hoberman
PDF
Idera live 2021: Why Data Lakes are Critical for AI, ML, and IoT By Brian Flug
PDF
Idera live 2021: Will Data Vault add Value to Your Data Warehouse? 3 Signs th...
PDF
Idera live 2021: Managing Digital Transformation on a Budget by Bert Scalzo
PDF
Idera live 2021: Keynote Presentation The Future of Data is The Data Cloud b...
PDF
Idera live 2021: Database Auditing - on-Premises and in the Cloud by Craig M...
PDF
Idera live 2021: Performance Tuning Azure SQL Database by Monica Rathbun
PPTX
Geek Sync | How to Be the DBA When You Don't Have a DBA - Eric Cobb | IDERA
PPTX
How Users of a Performance Monitoring Tool Can Benefit from an Inventory Mana...
PPTX
Benefits of Third Party Tools for MySQL | IDERA
PPTX
Achieve More with Less Resources | IDERA
PPTX
Benefits of SQL Server 2017 and 2019 | IDERA
PPTX
Be Proactive: A Good DBA Goes Looking for Signs of Trouble | IDERA
The role of the database administrator (DBA) in 2020: Changes, challenges, an...
Problems and solutions for migrating databases to the cloud
Public cloud uses and limitations
Optimize the performance, cost, and value of databases.pptx
Monitor cloud database with SQL Diagnostic Manager for SQL Server
Database administrators (dbas) face increasing pressure to monitor databases
Six tips for cutting sql server licensing costs
Idera live 2021: The Power of Abstraction by Steve Hoberman
Idera live 2021: Why Data Lakes are Critical for AI, ML, and IoT By Brian Flug
Idera live 2021: Will Data Vault add Value to Your Data Warehouse? 3 Signs th...
Idera live 2021: Managing Digital Transformation on a Budget by Bert Scalzo
Idera live 2021: Keynote Presentation The Future of Data is The Data Cloud b...
Idera live 2021: Database Auditing - on-Premises and in the Cloud by Craig M...
Idera live 2021: Performance Tuning Azure SQL Database by Monica Rathbun
Geek Sync | How to Be the DBA When You Don't Have a DBA - Eric Cobb | IDERA
How Users of a Performance Monitoring Tool Can Benefit from an Inventory Mana...
Benefits of Third Party Tools for MySQL | IDERA
Achieve More with Less Resources | IDERA
Benefits of SQL Server 2017 and 2019 | IDERA
Be Proactive: A Good DBA Goes Looking for Signs of Trouble | IDERA

Recently uploaded (20)

PPTX
CHAPTER 2 - PM Management and IT Context
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
Transform Your Business with a Software ERP System
PDF
Understanding Forklifts - TECH EHS Solution
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
Introduction to Artificial Intelligence
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Digital Strategies for Manufacturing Companies
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
CHAPTER 2 - PM Management and IT Context
2025 Textile ERP Trends: SAP, Odoo & Oracle
Wondershare Filmora 15 Crack With Activation Key [2025
Transform Your Business with a Software ERP System
Understanding Forklifts - TECH EHS Solution
wealthsignaloriginal-com-DS-text-... (1).pdf
How to Choose the Right IT Partner for Your Business in Malaysia
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Introduction to Artificial Intelligence
Upgrade and Innovation Strategies for SAP ERP Customers
Navsoft: AI-Powered Business Solutions & Custom Software Development
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
How to Migrate SBCGlobal Email to Yahoo Easily
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
How Creative Agencies Leverage Project Management Software.pdf
Digital Strategies for Manufacturing Companies
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free

IDERA Live | The Ever Growing Science of Database Migrations

  • 1. © 2016 IDERA, Inc. All rights reserved. Proprietary and confidential. © 2018 IDERA, Inc. All rights reserved. THE EVER GROWING SCIENCE OF DATABASE MIGRATIONS Presented by Bert Scalzo, PhD, MBA & Oracle ACE May 16th, 2018
  • 2. © 2018 IDERA, Inc. All rights reserved. BOOKS BY AUTHOR Fall 2017
  • 3. © 2018 IDERA, Inc. All rights reserved. 3 SESSION DESCRIPTION Many information technology professionals may not recognize it, but the bulk of their work has been and continues to be nothing more than database migrations. In the old days to share files across systems, then to move files into relational databases, then to load into data warehouses, and finally now we're moving to NoSQL and the cloud. In the presentation we'll delve into the ever growing and increasingly complex world of database migrations. Some of these considerations include: ▪ What issues must be planned for and overcome ▪ What problems are likely to occur ▪ What types of tools exist
  • 4. © 2018 IDERA, Inc. All rights reserved. BACKGROUND & TARGET AUDIENCE ▪ Over 30 years I’ve spent time working at both commercial and government institutions in various roles from entry level QA tester all the way up to up to Director of IT ▪ Toughest role has been as lead architect/DBA for mission critical production systems with extremely strict support responsibilities ▪ I’ve also spent time working for various software development companies where programmers who don’t have production support responsibilities but nonetheless build tools for those people who support mission critical systems ▪ This topic will concentrate on traditional business application developers
  • 5. © 2018 IDERA, Inc. All rights reserved. HOW MUCH TIME CODING I’ve seen many numbers quoted, but the common belief seems to be roughly 25% of the time
  • 6. © 2018 IDERA, Inc. All rights reserved. OF THAT 25% - MOST IS SIMPLY MOVING DATA ▪ Several studies have put forward that most IT professionals spend the vast majority of their time simply moving data (as high as 80% - 85%) ▪ My 30 years experience finds no possible reason to doubt those findings ▪ Thus we might be actually better off being paid by the gigabytes of data that we move rather than salary  It does make me wonder why I had to learn so many advanced data types and complex algorithms when much of my work has been as a simple data mule …
  • 7. “ © 2018 IDERA, Inc. All rights reserved. 7 Sisyphus was punished for his self- aggrandizing craftiness and deceitfulness by being forced to roll an immense boulder up a hill only for it to roll down when it nears the top, repeating this action for eternity
  • 8. “ © 2018 IDERA, Inc. All rights reserved. 8 Programmers are punished for their self- aggrandizing craftiness and deceitfulness by being forced to move immense amounts of data from source to target, repeating this action for eternity
  • 9. © 2018 IDERA, Inc. All rights reserved. THE STATES OF MATTER ▪ Remember that data is any company’s most important asset, so working with that asset is both natural and to be expected ▪ Data like matter has states, it’s either bound (i.e. at rest) or liberated (i.e. in motion) ▪ DBAs handle data at rest while programmers put data in motion via apps or data migrations ▪ Migrating data may not be sexy, but it’s what application developers are asked to do most of the time (i.e. I need a copy of that …) Liberated Bound
  • 10. © 2018 IDERA, Inc. All rights reserved. EXAMPLES OF TRADITIONAL DATA MIGRATION ▪ Traditional OLTP “data feeds” from one system to another ▪ Population of data warehouse directly from source systems (ETL) ▪ Population of a centralized Operational Data Store (ODS) ▪ Population of data warehouse (DW) indirectly from central ODS ▪ Population of data marts from either source systems, ODS or DW ▪ Data extraction from source systems into cubes for business analysis ▪ Data extraction for delivery to business users (e.g. CSV to import Excel)
  • 11. © 2018 IDERA, Inc. All rights reserved. THE MODERN WORLD OF DATA MIGRATIONS ▪ Data engineers construct massive big data reservoirs sometimes referred to as “data pools” for investigation by both data scientists & data analysts (often NoSQL) as businesses embrace data analytics & data mining ▪ Consolidating databases to save on licensing costs ▪ Moving from expensive database platform to an open source alternative ▪ Moving portions of databases into the cloud ▪ Moving entire databases into the cloud ▪ Combos of all the above
  • 12. “ © 2018 IDERA, Inc. All rights reserved. 12 Tantalus was made to stand in a pool of water beneath a fruit tree with low branches, with the fruit ever eluding his grasp, and the water always receding before he could take a drink.
  • 13. “ © 2018 IDERA, Inc. All rights reserved. 13 Analysts are made to stand in a pool of data beneath the tree of knowledge with low branches, with the facts ever eluding their grasp, and information always receding before they can claim success.
  • 14. © 2018 IDERA, Inc. All rights reserved. THE EVER GROWING DATA DEMAND ▪ Businesses are addicted to information since they see it as an edge ▪ Technology improvements have lowered costs to keep historical data ▪ Data mining, data analytics and data science all added fuel to this fire ▪ The cloud makes all this quicker and cheaper to deploy …
  • 15. © 2018 IDERA, Inc. All rights reserved.
  • 16. © 2018 IDERA, Inc. All rights reserved. THE NATURE OF DATA GROWTH
  • 17. © 2018 IDERA, Inc. All rights reserved. That’s 180 billion terabytes of data!!!
  • 18. “ © 2018 IDERA, Inc. All rights reserved. 18 I feel self-doubt whether I'm doing something hard or easy. Sigourney Weaver
  • 19. “ © 2018 IDERA, Inc. All rights reserved. 19 Drones can be useful tools, and I am all about useful tools. One of my mottos is 'the right tool for the right job.' Martha Stewart
  • 20. © 2018 IDERA, Inc. All rights reserved. CUSTOM CODE ▪ If source and target same then have programmers write custom code in database language (e.g. T-SQL or PL/SQL) to move data ▪ If source and target different then have programmers write custom code in 3GL or 4GL scripting language (e.g. PowerShell or PERL) to move data ▪ Even if you know the systems and the data, the time required can be prohibitive but management often makes the mistake that this will be cheaper than buying tools ▪ Performance often poor since multithreaded, parallelized code generally not easy to write nor debug (and few are good at it anyhow)
  • 21. © 2018 IDERA, Inc. All rights reserved. EXPORT / IMPORT – DATA (WILL SHOW IN DEMO) ▪ If source and target same then often options exist to use export/import (with proprietary file format) or backup/restore as a data movement tool ▪ If source and target different then sometimes export offers the ability to dump data either as SQL statements or as raw data (e.g. CSV file) but then target ideally should offer high speed data loader (e.g. SQL/Loader) ▪ Sometimes these tools offer parallel capabilities but make sure to know the caveats such as using more space due to “holes” in the data from parallel inserts not filling entire pages / blocks ▪ High disk space cost since must have three copies of data existing at the same time (i.e. source, target and exported files)
  • 22. © 2018 IDERA, Inc. All rights reserved. EXPORT / IMPORT – SQL (WILL SHOW IN DEMO) ▪ Often viewed as a “safe & easy” choice since SQL INSERT command is generally fairly uniform across databases with few proprietary additions ▪ Need one of two capabilities: generated SQL INSERT contains COMMIT every N statements or a SQL command line tool with auto commit option ▪ Generally neither parallelized nor multithreaded (i.e. slow option) ▪ Difficulties with highly complex data type conversion rules for date, time, datetime and single/double quoted strings needing escaped ▪ High disk space cost since must have three copies of data existing at the same time (i.e. source, target and exported files)
  • 23. © 2018 IDERA, Inc. All rights reserved. EXTRACT TRANSFORM LOAD (ETL) ▪ ETL tools are quite popular and there are many open source tools (free) ▪ ETL tools were designed for and are great at performing highly complex transformations, however simple tasks are often made more tedious by the overhead of such a robust and powerful design ▪ ETL tools are HUGE, often a couple gigabytes for just the client GUI and also requiring deployment of a server agent on “close vicinity” servers ▪ ETL tool learning curves are significant and unless used daily often must be relearned when needed ▪ ETL tools may not best leverage database platform specific capabilities
  • 24. © 2018 IDERA, Inc. All rights reserved. BASIC DATA MOVERS (WILL SHOW IN DEMO) ▪ Good choice when all you need to do is to move data from point A to B ▪ Often offer good parallel and multithreaded capabilities for speed ▪ Handles all the highly complex data type conversion rules for date, time, datetime and single/double quoted strings needing escaped ▪ Client based tools only good for relatively small databases (PC CPU and memory plus network bottlenecks) ▪ Some tools offer ability for client to run on server or an agent on server ▪ There are some free tools but they generally require paying to scale up
  • 25. © 2018 IDERA, Inc. All rights reserved. REPLICATION BASED ▪ This is a somewhat newer and novel approach that is gaining traction ▪ Replication tools offering cross database support can be used for either synchronous or asynchronous replication as the data movement engine ▪ Often offer excellent parallel and multithreaded capabilities for speed ▪ Handles all the highly complex data type conversion rules for date, time, datetime and single/double quoted strings needing escaped ▪ Generally handled by privileged users like DBAs, not app developers ▪ Solution can be quite expensive unless you already have replication tool
  • 26. © 2018 IDERA, Inc. All rights reserved. CLOUD SOLUTION – DATABASE VENDOR – SQL SERVER ▪ SQL Server 2016 introduced the “stretch database” concept (to Azure) ▪ Relatively simple syntax to define rules for what portions of the data are placed in the cloud (note that this is not just one time but rather ongoing) ▪ SQL Server query optimizer makes this data split totally transparent to all applications (so cloud adoption can easily be rolled out incrementally) ▪ There are some limitations (e.g. data types, index types, table types, etc.) ▪ Can be expensive: space + compute + DSU (a database stretch unit represents the power of the query and is quantified by your workload objectives: how fast rows are written, read and computed against)
  • 27. © 2018 IDERA, Inc. All rights reserved. CLOUD SOLUTION – DATABASE VENDOR – ORACLE ▪ Oracle 12c R2 multitenant offers “relocate PDB” concept (to Oracle Cloud) ▪ Can be done online!!! ▪ Simple drag and drop operation to perform!!! ▪ Bidirectional – so can incrementally deploy or revert as needed ▪ May perform quicker than one would imagine – sometimes fastest option ▪ Require source database is a PDB (i.e. pluggable database container) ▪ Cost – requires enterprise edition with multitenant option (next page)
  • 28. © 2018 IDERA, Inc. All rights reserved.
  • 29. © 2018 IDERA, Inc. All rights reserved. CLOUD VENDOR SOLUTION – EXAMPLE AWS ▪ Import/Export service (mail your portable storage devices to Amazon) ▪ AWS Snowball Appliance (create snowball job, AWS sends portable storage device, run snowball client to copy the data, ship back the portable storage device to AWS, data automatically copied to your S3) ▪ AWS Glue (ETL tool for migrating cloud to cloud) ▪ AWS Kinesis Data Firehose (easiest way to load streaming loud data into cloud data stores and cloud analytics tools) ▪ There are more and Amazon is constantly creating new offerings …
  • 30. © 2018 IDERA, Inc. All rights reserved. CLOUD VENDOR SOLUTION – EXAMPLE AZURE ▪ Import/Export service (mail your portable storage devices to Microsoft) ▪ Azure Data Factory (fully managed cloud-based data integration service) ▪ Azure Polybase (fastest possible loading of Azure SQL Data Warehouse leveraging the entire Massively Parallel Processing (MPP) architecture) • PolyBase with T-SQL • PolyBase with SSIS • PolyBase with Azure Data Factory (ADF) • PolyBase with Azure DataBricks ▪ There are more and Microsoft is constantly creating new offerings …
  • 31. © 2018 IDERA, Inc. All rights reserved. DEMO
  • 32. © 2016 IDERA, Inc. All rights reserved. Proprietary and confidential.© 2018 IDERA, Inc. All rights reserved. 32 THANKS! Any questions? You can find me at: bertscalzo2@gmail.com