SlideShare a Scribd company logo
Offload,	Transform,	and	Present	-	the	New	World	of	Data	Integration
Michael	Rainey
• Michael	Rainey	-	Technical	Advisor	
• Spreading	the	good	word	about	gluent	

products	with	the	world		
• Data	Integration	expertise	
• Oracle	ACE	Director	
• @mRainey
2
Introduction
we liberate enterprise data
3
About	Gluent
Globally	distributed	team	with	a	deep	
background	in	building	high	performance	
enterprise	applications	and	systems	…
Now	on	a	mission	to		
liberate	enterprise	data.
Gluent Data Platform
Advisor
Open Data Formats
Security
Orchestration
Access Offload Present
3
About	Gluent
Globally	distributed	team	with	a	deep	
background	in	building	high	performance	
enterprise	applications	and	systems	…
2nd	place	in	Strata+HadoopWorld	

Startup	Showcase	2017.	
(Demo	video	at	gluent.com)
Now	on	a	mission	to		
liberate	enterprise	data.
Recognized	by	Gartner	in		
their	Cool	Vendors	in	Data		
Management,	2017	report!	
Download	(no	registration	needed):	
https://gluent.com/cool-vendor-2017/
Listed	as	one	of	the	

“Top	10	Coolest	Big	Data	
Startups	of	2017”	by	CRN	
gluent.com/gluent-recognized-in-
top-10-coolest-startups-of-2017-on-crn
4
Gluent is running one-hour webinars each Wednesday in July beginning tomorrow, July 12.
See details below and register for your free spot today!



Apache Impala Internals

Speaker: Tanel Poder, Gluent

Wednesday, July 19 @ 12 PM CDT

gluent.com/event/gluent-webinar-apache-impala-internals-with-tanel-poder



Building an Analytics Platform with Oracle & Hadoop

Speakers: Gerry Moore & Suresh Irukulapati, Vistra Energy

Wednesday, July 26 @ 9 AM CDT

gluent.com/event/gluent-webinar-building-an-integrated-analytics-platform-with-oracle-and-hadoop
Gluent	Webinars	-	JULY	2017
5
Extract	Transform	Load	(ETL)
Source	1
Source	2
Source	X
EDW
Transform
Extract
Load
6
ETL	Examples	-	employee	data
EDW
HR
Network
Facilities
Security
Training
6
ETL	Examples	-	employee	data
EDW
HR
Network
Facilities
Security
Training
6
ETL	Examples	-	employee	data
EDW
HR
Network
Facilities
Security
Training
6
ETL	Examples	-	employee	data
EDW
HR
Network
Facilities
Security
Training
Employee
6
ETL	Examples	-	employee	data
EDW
HR
Network
Facilities
Security
Training
Employee
7
ETL	Examples	-	IoT	from	set-top	boxes
IoT
7
ETL	Examples	-	IoT	from	set-top	boxes
IoT
EDW
CRM
CRM
7
ETL	Examples	-	IoT	from	set-top	boxes
IoT
EDW
CRM
CRM
Enhanced
customer
data
7
ETL	Examples	-	IoT	from	set-top	boxes
IoT
EDW
CRM
CRM
Enhanced
customer
data
8
ETL	Examples	-	data	scientist	pipelines
8
ETL	Examples	-	data	scientist	pipelines
8
ETL	Examples	-	data	scientist	pipelines
8
ETL	Examples	-	data	scientist	pipelines
9
Typical	ETL	development	-	too	much	time	spent	on	“E”	and	“L”
reader = new TransformingReader(reader)
.add(new SetCalculatedField("AvailableCredit",
"parseDouble(CreditLimit) - parseDouble(Balance)"))
.add(new ExcludeFields("CreditLimit",
"Balance"));
DataWriter writer = new
JdbcWriter(getJdbcConnection(), "dp_credit_balance")
.setAutoCloseConnection(true);
JobTemplate.DEFAULT.transfer(reader, writer);
9
Typical	ETL	development	-	too	much	time	spent	on	“E”	and	“L”
ETL	Developer	/	Data	Engineer	
should	spend	their	time	on		
Transformations!
reader = new TransformingReader(reader)
.add(new SetCalculatedField("AvailableCredit",
"parseDouble(CreditLimit) - parseDouble(Balance)"))
.add(new ExcludeFields("CreditLimit",
"Balance"));
DataWriter writer = new
JdbcWriter(getJdbcConnection(), "dp_credit_balance")
.setAutoCloseConnection(true);
JobTemplate.DEFAULT.transfer(reader, writer);
10
We	still	need	to	move	data!
• Data	lake	-	enterprise	data	stored	in	raw	form	
• Data	hub	-	centralized	data	store	with	standards,	governance,	and	quality	
11
Data	lake	or	data	hub?
• Sharing	data	without	moving	data
• Metadata	is	stored	about	each	“source”	database
• Queries	are	passed	through	a	metadata	

layer,	which	provides	info	about	where	the	

data	lives	and	how	to	translate	the	query
12
What	about	data	federation?
Source	1
Source	2
Source	X
Source	1
Source	2
Source	X
App	1
App	2
App	1
App	2
• Sharing	data	without	moving	data
• Metadata	is	stored	about	each	“source”	database
• Queries	are	passed	through	a	metadata	

layer,	which	provides	info	about	where	the	

data	lives	and	how	to	translate	the	query
• But…
12
What	about	data	federation?
Source	1
Source	2
Source	X
Source	1
Source	2
Source	X
App	1
App	2
App	1
App	2
• Sharing	data	without	moving	data
• Metadata	is	stored	about	each	“source”	database
• Queries	are	passed	through	a	metadata	

layer,	which	provides	info	about	where	the	

data	lives	and	how	to	translate	the	query
• But…
• We	must	recode	applications!
• Data	remains	in	RDBMS	silos	and	are	

limited	to	CPU/storage	constraints
12
What	about	data	federation?
Source	1
Source	2
Source	X
Source	1
Source	2
Source	X
App	1
App	2
App	1
App	2
13
Three	things	we	need	the	ability	to	do:
1. Offload	data	from	Oracle	to	Hadoop		
2. Load	data	from	Hadoop	to	Oracle	
3. Query	Hadoop	data	in	Oracle
• “Offload”	-	copy	or	move	data	from	the	relational	database	to	Hadoop	
• Why	offload?	
• Store	data	in	a	centralized	location	for	access	and	sharing	
• Use	the	distributed,	parallel	processing	power	of	Hadoop	for	transformations	
• Enable	the	use	of	“new	world”	technologies	(Spark,	Impala,	etc)	
• Why	Hadoop?	
• We	can	now	afford	to	keep	a	copy	of	all	enterprise	data	for	data	sharing	reasons!
14
Offload,	not	Extract
15
How	to	offload?	There	are	many	options
Tool Offload	Data
Sqoop Yes
Oracle	Loader	for	Hadoop
Oracle	SQL	Connector	for	HDFS
ODBC	Gateway
Big	Data	SQL
Gluent	Data	Platform Yes
• Command	line	client	used	to	bulk	copy	data	from	a	relational	database	to	
HDFS	over	JDBC	connection	
• Also	works	in	reverse,	HDFS	to	RDBMS		
• Sqoop	generates	MapReduce	jobs	for	the	work
16
Bulk	data	load	with	Sqoop
sqoop import --connect jdbc:oracle:thin:@myserver:1521/MYDB1
--username myuser
--null-string ''
--null-non-string ''
--target-dir=/user/hive/warehouse/ssh/sales
--append -m1 --fetch-size=5000
--fields-terminated-by '','' --lines-terminated-by ''n''
--optionally-enclosed-by ''"'' --escaped-by ''"''
--split-by TIME_ID
--query ""SELECT * FROM SSH.SALES WHERE TIME_ID < DATE'1998-01-01'""
• Gluent	Advisor	helps	determine	which	tables	
and/or	partitions	are	candidates	for	offload	
• Offload	options:	
• Move	or	copy	100%	of	data	
• Move	“cold”	partitions	only	
• Data	can	be	kept	in-sync	with	an	incremental	
offload		
• Example:	when	a	partition	goes	inactive	(cold),	it	
can	be	offloaded	to	Hadoop	
• Data	can	be	updated	from	RDBMS	to	Hadoop
17
Offload	data	with	Gluent	Data	Platform
• Gluent	Advisor	helps	determine	which	tables	
and/or	partitions	are	candidates	for	offload	
• Offload	options:	
• Move	or	copy	100%	of	data	
• Move	“cold”	partitions	only	
• Data	can	be	kept	in-sync	with	an	incremental	
offload		
• Example:	when	a	partition	goes	inactive	(cold),	it	
can	be	offloaded	to	Hadoop	
• Data	can	be	updated	from	RDBMS	to	Hadoop
17
Offload	data	with	Gluent	Data	Platform
gluent.
80%	-	20%
offload -x -t EDW.BALANCE_DETAIL_MONTHLY --less-than-value=2017-05-01
• Gluent	Advisor	helps	determine	which	tables	
and/or	partitions	are	candidates	for	offload	
• Offload	options:	
• Move	or	copy	100%	of	data	
• Move	“cold”	partitions	only	
• Data	can	be	kept	in-sync	with	an	incremental	
offload		
• Example:	when	a	partition	goes	inactive	(cold),	it	
can	be	offloaded	to	Hadoop	
• Data	can	be	updated	from	RDBMS	to	Hadoop
17
Offload	data	with	Gluent	Data	Platform
gluent.
80%	-	20%
offload -x -t EDW.BALANCE_DETAIL_MONTHLY --less-than-value=2017-05-01
18
Three	things	we	need	the	ability	to	do:
1. Offload	data	from	Oracle	to	Hadoop		
2. Load	data	from	Hadoop	to	Oracle	
3. Query	Hadoop	data	in	Oracle
18
Three	things	we	need	the	ability	to	do:
1. Offload	data	from	Oracle	to	Hadoop		
2. Load	data	from	Hadoop	to	Oracle	
3. Query	Hadoop	data	in	Oracle
Present
• Sqoop	in	reverse	
• Reads	HDFS	files,	converts	to	JDBC	arrays	and	insert	(or	update)	statements
19
The	“L”	of	ETL	-	Loading	the	data
RDBMS
20
Forget	the	“L”oad.	Present	the	data	to	the	RDBMS
RDBMS
21
Load	the	data	from	Hadoop	to	RDBMS?
Present
21
Load	the	data	from	Hadoop	to	RDBMS?
Tool Offload	Data Load	Data
Sqoop Yes Yes
Oracle	Loader	for	Hadoop Yes
Oracle	SQL	Connector	for	HDFS Yes
ODBC	Gateway Yes
Big	Data	SQL Yes
Gluent	Data	Platform Yes
Present
Present
21
Load	the	data	from	Hadoop	to	RDBMS?
Tool Offload	Data Load	Data
Sqoop Yes Yes
Oracle	Loader	for	Hadoop Yes
Oracle	SQL	Connector	for	HDFS Yes
ODBC	Gateway Yes
Big	Data	SQL Yes
Gluent	Data	Platform Yes
Present
Present
• Oracle	SQL	Connector	for	HDFS	
• Uses	Oracle	External	Table	to	access	delimited	file	or	Hive	table	in	Hadoop	
• When	queried,	all	data	is	read	from	Hadoop	to	Oracle,	then	processed	
• Oracle	to	Hadoop	over	database	links	
• Create	a	database	link	from	Oracle	to	Hadoop	using	the	Impala	(or	Hive)	ODBC	
gateway	
• Pushes	filters	to	Hadoop	but	not	grouping	aggregates	
• Big	Data	SQL	
• Access	Hive	tables	via	an	Oracle	external	table	
• Predicate	pushdown	(not	aggregations	&	joins)
22
Present	data	to	Oracle
23
Gluent	Present
gluent.
23
Gluent	Present
gluent.
present -t "SH.SALES" -xv --target-name="SH.SALES_DIVISION_X"
23
Gluent	Present
gluent.
present -t "SH.SALES" -xv --target-name="SH.SALES_DIVISION_X"
Gluent	Present	can	
pushdown	predicates,	
aggregations,	and	joins
24
Three	things	we	need	the	ability	to	do:
1. Offload	data	from	Oracle	to	Hadoop		
2. Load	data	from	Hadoop	to	Oracle	
3. Query	Hadoop	data	in	Oracle
Present
25
How	to	query	the	data?	There	are	many	options
Tool Offload	Data Load	Data Allow	Query	
Data
Offload	
Query
Parallel	
Execution
Sqoop Yes Yes Yes
Oracle	Loader	for	Hadoop Yes Yes
Oracle	SQL	Connector	for	HDFS Yes Yes Yes
ODBC	Gateway Yes Yes Yes
Big	Data	SQL Yes Yes Yes Yes
Gluent	Data	Platform Yes Yes Yes YesPresent
25
How	to	query	the	data?	There	are	many	options
Tool Offload	Data Load	Data Allow	Query	
Data
Offload	
Query
Parallel	
Execution
Sqoop Yes Yes Yes
Oracle	Loader	for	Hadoop Yes Yes
Oracle	SQL	Connector	for	HDFS Yes Yes Yes
ODBC	Gateway Yes Yes Yes
Big	Data	SQL Yes Yes Yes Yes
Gluent	Data	Platform Yes Yes Yes YesPresent
26
Present	Data	From	Anywhere	To	Anywhere
gluent.
26
Present	Data	From	Anywhere	To	Anywhere
gluent.
26
Present	Data	From	Anywhere	To	Anywhere
gluent.
26
Present	Data	From	Anywhere	To	Anywhere
gluent.
• A	new	approach…a	new	acronym?
27
Offload	Transform	Present	(OTP?)
EDW
28
ETL	Examples	-	employee	data
EDW
HR
Network
Facilities
Security
Training
Employee
29
Now	using	OTP	-	employee	data
EDW
Employee
29
Now	using	OTP	-	employee	data
EDW
Employee
29
Now	using	OTP	-	employee	data
EDW
Employee
29
Now	using	OTP	-	employee	data
EDW
Employee
30
ETL	Examples	-	IoT	from	set-top	boxes
IoT
EDW
CRM
CRM
Enhanced
customer
data
EDW
CRM
CRM
IoT
Enhanced
customer
data
30
ETL	Examples	-	IoT	from	set-top	boxes
31
ETL	Examples	-	data	scientist	pipelines
31
ETL	Examples	-	data	scientist	pipelines
31
ETL	Examples	-	data	scientist	pipelines
• No	upfront	massive	project	cost	to	move	data	around.		
• Can	happen	in	minutes	/	hours,	not	weeks	/	months	
• Large	corporations	can	move	as	fast	as	startups	
• Data	Engineers	/	ETL	Developers	can	focus	on	transformations!	
• A	new	approach	to	data	sharing	across	enterprise		
• Offload	all	data	for	simple	access	
• Takes	a	different	mindset
32
OTP	simplifies	the	process	of	data	movement!
33
Data	Consolidation
SALES	
(customer	A)
Customers
Products Preferences
Promotions
Prices
SALES	
(customer	B)
Customers
Products Preferences
Promotions
Prices
SQL
33
Data	Consolidation
SALES	
(customer	A)
Customers
Products Preferences
Promotions
Prices
SALES	
(customer	B)
Customers
Products Preferences
Promotions
Prices
Gluent
Gluent
Offload
Customers	A
Customers	B
Raw	data
SQL
33
Data	Consolidation
SALES	
(customer	A)
Customers
Products Preferences
Promotions
Prices
SALES	
(customer	B)
Customers
Products Preferences
Promotions
Prices
Gluent
Gluent
Offload Transform	(optional)
Customers	A
Customers	B
->
Raw	data
SQL
33
Data	Consolidation
SALES	
(customer	A)
Customers
Products Preferences
Promotions
Prices
SALES	
(customer	B)
Customers
Products Preferences
Promotions
Prices
Gluent
Gluent
Offload Transform	(optional)
Customers	A
Customers	B
Customers	
ALL
->
Raw	data Consolidated	
data
Union	all,	SQL,	
Spark
SQL
33
Data	Consolidation
SALES	
(customer	A)
Customers
Products Preferences
Promotions
Prices
ALL
ALL
SALES	
(customer	B)
Customers
Products Preferences
Promotions
Prices
Gluent
Gluent
Offload Transform	(optional) Present
Customers	A
Customers	B
Products	ALL
Preferences	ALL
Prices	
Gluent
Customers	
ALL
Promotions
SALES	
(ALL)
-> ->
Raw	data Consolidated	
data
Customers	
ALL
Union	all,	SQL,	
Spark
SQLSQL
Virtualized	
tables
34
we liberate enterprise data
thank you!
35
Gluent is running one-hour webinars each Wednesday in July beginning tomorrow, July 12.
See details below and register for your free spot today!



Apache Impala Internals

Speaker: Tanel Poder, Gluent

Wednesday, July 19 @ 12 PM CDT

gluent.com/event/gluent-webinar-apache-impala-internals-with-tanel-poder



Building an Analytics Platform with Oracle & Hadoop

Speakers: Gerry Moore & Suresh Irukulapati, Vistra Energy

Wednesday, July 26 @ 9 AM CDT

gluent.com/event/gluent-webinar-building-an-integrated-analytics-platform-with-oracle-and-hadoop
Gluent	Webinars	-	JULY	2017

More Related Content

PPTX
Gluent Extending Enterprise Applications with Hadoop
PDF
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
PPTX
Surviving the Hadoop Revolution
PDF
How to design and implement a data ops architecture with sdc and gcp
PPTX
Big Data – A New Testing Challenge
PDF
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
PPTX
Hadoop for the Masses
PDF
Lambda architecture for real time big data
Gluent Extending Enterprise Applications with Hadoop
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
Surviving the Hadoop Revolution
How to design and implement a data ops architecture with sdc and gcp
Big Data – A New Testing Challenge
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Hadoop for the Masses
Lambda architecture for real time big data

What's hot (20)

PDF
DataOps - Lean principles and lean practices
PDF
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
PDF
Advanced data science algorithms applied to scalable stream processing by Dav...
PDF
InfoTrack: Creating a single source of truth with the Elastic Stack
PDF
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
PPTX
Deploying Big Data Platforms
PPTX
Getting It Right Exactly Once: Principles for Streaming Architectures
PPTX
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
PPTX
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
PDF
Airbyte @ Airflow Summit - The new modern data stack
PPTX
Using Hadoop to build a Data Quality Service for both real-time and batch data
PPTX
Netflix Data Engineering @ Uber Engineering Meetup
PPTX
How to Operationalise Real-Time Hadoop in the Cloud
PDF
Big data pipelines
PDF
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
PDF
Big Data Computing Architecture
PPTX
Digital Business Transformation in the Streaming Era
PDF
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
PDF
Finding the needle in the haystack: how Nestle is leveraging big data to defe...
PDF
Modern data integration expert sessions
DataOps - Lean principles and lean practices
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Advanced data science algorithms applied to scalable stream processing by Dav...
InfoTrack: Creating a single source of truth with the Elastic Stack
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Deploying Big Data Platforms
Getting It Right Exactly Once: Principles for Streaming Architectures
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
Airbyte @ Airflow Summit - The new modern data stack
Using Hadoop to build a Data Quality Service for both real-time and batch data
Netflix Data Engineering @ Uber Engineering Meetup
How to Operationalise Real-Time Hadoop in the Cloud
Big data pipelines
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Big Data Computing Architecture
Digital Business Transformation in the Streaming Era
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Finding the needle in the haystack: how Nestle is leveraging big data to defe...
Modern data integration expert sessions
Ad

Recently uploaded (20)

PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PPTX
Big Data Technologies - Introduction.pptx
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Unlocking AI with Model Context Protocol (MCP)
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Cloud computing and distributed systems.
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Electronic commerce courselecture one. Pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PPTX
MYSQL Presentation for SQL database connectivity
GamePlan Trading System Review: Professional Trader's Honest Take
Big Data Technologies - Introduction.pptx
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
20250228 LYD VKU AI Blended-Learning.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Unlocking AI with Model Context Protocol (MCP)
The AUB Centre for AI in Media Proposal.docx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Cloud computing and distributed systems.
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
The Rise and Fall of 3GPP – Time for a Sabbatical?
Understanding_Digital_Forensics_Presentation.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
Empathic Computing: Creating Shared Understanding
Electronic commerce courselecture one. Pdf
NewMind AI Weekly Chronicles - August'25 Week I
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
MYSQL Presentation for SQL database connectivity
Ad

Offload, Transform, and Present - The New World of Data Integration