SlideShare a Scribd company logo
www.enkitec.com++ 1+++
Moving'Data'Between'Oracle'Exadata'and'Hadoop.'
Fast.+
Tanel'Põder'
Enkitec'
+
h.p://www.enkitec.com+
h.p://blog.tanelpoder.com+
www.enkitec.com++ 2+++
Intro:+About+me+
•  Tanel+Põder+
•  Former+Oracle+Database+Performance+geek+
•  Present+Exadata+Performance+geek+
•  Future+Hadoop+Perfomance+geek+
•  My+Exadata+experience+
•  2009+...+2013+
•  Exadata+V1+…+X3+
•  MulOPrack+Exadatas+
•  MixedPrack+Exadatas+
•  My+Hadoop+Experience+
•  Ask+again+next+year+;P)+
+
Expert'Oracle'Exadata'
book+
(with+Kerry+Osborne+and+
Randy+Johnson+of+Enkitec)+
www.enkitec.com++ 3+++
About+Enkitec+
•  Enkitec+
•  North+America+
•  EMEA+
+
•  100++staff+
•  In+US,+Europe+
•  Consultants+with++
Oracle+experience++
of+15++years+on+average+
•  What+makes+us+so+awesome+
•  200+'Exadata'implementaBons'to'date'
+
•  Enkitec+ExaPLab++
•  We+have+3+Exadatas+(V2,+X2P2,+X3P2)+
•  FullPRack+Big+Data+Appliance+
•  ExalyOcs+
•  ODA+
Everything'Exa'
'
Planning/PoC+
ImplementaOon+
ConsolidaOon+
MigraOon+
Backup/Recovery+
Patching+
TroubleshooOng+
Performance+
Capacity+
Training+
www.enkitec.com++ 4+++
Our+exaPlab+environment+
•  Exadata+V2+(quarter+rack)+
•  Exadata+X2P2+(quarter+rack)+
•  Exadata'X3G2'(quarter'rack)'
•  Big'Data'Appliance'(full'rack)'
•  ExalyOcs,+ODA,+etc+
IB+
www.enkitec.com++ 5+++
Disclaimers++
•  The+numbers+shown+here+are+not+from+"real"+benchmarks+
•  The+actual+data+loading+speeds+vary+greatly+when+using+real+data+
•  (column+count,+datatypes+etc+etc)+
•  This+is+not+a+"how+to+configure+hadoop+tools"+session+
•  ...it's+all+about+performance+
www.enkitec.com++ 6+++
(Too)+Many+Data+Loading+OpOons+
•  Pull+Hadoop+data+into+Oracle+
•  Oracle'SQL'Connector'for'HDFS'
•  Oracle+Heterogenous+Services+++Hive/Impala+ODBC+
•  FusePmounted+HDFS+++external+table+load+
•  Push+Hadoop+data+into+Oracle+
•  Sqoop+
•  Oracle+Loader+for+Hadoop+
•  Pull+Oracle+data+into+Hadoop+
•  Sqoop+
•  Tom+Kyte's+flat+unloader+(to+Hadoop+local+filesystem+++copy+to+HDFS)+
www.enkitec.com++ 7+++
Oracle+SQL+Connector+for+HDFS+
CREATE TABLE "TANEL"."TERASORT_1T_100"	
(	"TOKEN_TYPE" VARCHAR2(4000),	
	"DATE_MONTH" VARCHAR2(4000),	
	"TOKEN_COUNT" VARCHAR2(4000),	
	"TOKEN_VALUE" VARCHAR2(4000)	
)	
ORGANIZATION EXTERNAL	
( TYPE ORACLE_LOADER	
DEFAULT DIRECTORY "EXT_HDFS_TEST_DIR"	
ACCESS PARAMETERS	
( RECORDS DELIMITED BY 0X'0A'	
PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream'	
FIELDS TERMINATED BY 0X'3058273927'	
( "TOKEN_TYPE" CHAR(4000),	
"DATE_MONTH" CHAR(4000),	
"TOKEN_COUNT" CHAR(4000),	
"TOKEN_VALUE" CHAR(4000)	
)	
)	
LOCATION	
( 'osch-tanel-00000',	
'osch-tanel-00001',	
'osch-tanel-00002',	
'osch-tanel-00003'	
)	
) ...	
Visible+to+Oracle+as+an+
External+Table.+
Parallelizable.+Insert+select,+
CTAS+
The+PREPROCESSOR+
program+hdfs_stream+is+a+
java+program+capable+of+
reading/streaming+files+from+
HDFS+
The+Oracle+SQL+Connector+
Data+"locaOon+pointer"+files+
to+1'TB+of+data+
www.enkitec.com++ 8+++
OSCH+data+locaOon+files+
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>	
<locationFile>	
<header>	
<version>1.0</version>	
<fileName>osch-20130708020324-4644-1</fileName>	
<createDate>2013-07-08T14:03:24</createDate>	
<publishDate>2013-07-08T02:03:24</publishDate>	
<productName>Oracle SQL Connector for HDFS Release 2.1.0 - Production</productName>	
<productVersion>2.1.0</productVersion>	
</header>	
<uri_list>	
<uri_list_item size="10000000000" compressionCodec="">	
hdfs://enkbda-ns/user/acolvin/terasort/part-00000	
</uri_list_item>	
<uri_list_item size="10000000000" compressionCodec="">	
hdfs://enkbda-ns/user/acolvin/terasort/part-00006	
</uri_list_item>	
<uri_list_item size="10000000000" compressionCodec="">	
hdfs://enkbda-ns/user/acolvin/terasort/part-00008	
</uri_list_item>	
<uri_list_item size="10000000000" compressionCodec="">	
hdfs://enkbda-ns/user/acolvin/terasort/part-00014	
</uri_list_item>	
<uri_list_item size="10000000000" compressionCodec="">	
hdfs://enkbda-ns/user/acolvin/terasort/part-00016	
</uri_list_item>	
...	
Each+"locaOon+pointer"+file+
the+external+table+loader+
uses+points+to+one+or+more+
actual+HDFS+files+
+
(this+config+file+is+edited+for+
fomaong+purposes)+
www.enkitec.com++ 9+++
TesOng+Oracle+SQL+Connector+for+HDFS+
•  CREATE+TABLE+target+AS++
SELECT+/*++PARALLEL+*/+*+FROM++terasort_1t;+
Only+75+MB+per+
second?+
www.enkitec.com++ 10+++
Where+is+your+bo.leneck?+
Hadoop+Cluster+
HDFS+
MR+
job+
MR+
job+
MapReduce+
(+CPU+)+
Oracle+Database+
Storage+
MR+
job+
MR+
job+
PX+Slaves+
(+CPU+)+
I/O+
O/I+
Network+
+
+
"ComputaOon"+
Decompression+
Text+file+parsing+
Datatype+conversion+
Text+file+parsing?+
Datatype+conversion?+
HCC+compression?+
DB+Waits+
ContenBon?'
+
Network+bandwidth+/+
throughput+/+
configuraOon++
The'only'way'to'
know'is'to'measure!'
www.enkitec.com++ 11+++
TesOng+Oracle+SQL+Connector+for+HDFS+
www.enkitec.com++ 12+++
Unbalanced+Parallel+Slave+acOvity?+
www.enkitec.com++ 13+++
Increase+Max+Allowed+External+Table+Parallelism+
CREATE TABLE terasort_1t_100 (	
...	
ORGANIZATION EXTERNAL	
( TYPE ORACLE_LOADER	
DEFAULT DIRECTORY "EXT_HDFS_TEST_DIR"	
...	
PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream'	
...	
LOCATION	
(	
'osch-tanel-00000'	
, 'osch-tanel-00001'	
, 'osch-tanel-00002'	
, 'osch-tanel-00003'	
, 'osch-tanel-00004'	
, 'osch-tanel-00005'	
, 'osch-tanel-00006'	
, 'osch-tanel-00007'	
, 'osch-tanel-00008'	
, 'osch-tanel-00009'	
, 'osch-tanel-00010'	
...	
, 'osch-tanel-00098'	
, 'osch-tanel-00099'	
)	
...	
SoluOon:+Create+more+
"locaOon+pointer"+files.++
100+"locaOon+pointer+files",+
each+poinOng+to+a+single+
HDFS+file+(in+my+test)+
This+allows+up#to+100+slaves+
in+parallel,+accessing+one+
HDFS+stream+each.+
www.enkitec.com++ 14+++
More+"finePgrained"+OSCH+data+locaOon+files+
$ cat osch-tanel-00099 	
	
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>	
<locationFile>	
<header>	
<version>1.0</version>	
<fileName>osch-tanel-00099</fileName>	
<createDate>2013-07-08T14:03:24</createDate>	
<publishDate>2013-07-08T02:03:24</publishDate>	
<productName>Oracle SQL Connector for HDFS Release 2.1.0 - Production</productName>	
<productVersion>2.1.0</productVersion>	
</header>	
<uri_list>	
<uri_list_item size="10000000000" compressionCodec="">	
hdfs://enkbda-ns/user/acolvin/terasort/part-00099	
</uri_list_item>	
</uri_list>	
</locationFile>	
	
$ ls -l osch-tanel*	
-rwxr-xr-x 1 nobody users 598 Sep 24 12:07 osch-tanel-00000	
-rwxr-xr-x 1 nobody users 598 Sep 24 12:07 osch-tanel-00001	
-rwxr-xr-x 1 nobody users 598 Sep 24 12:07 osch-tanel-00002	
-rwxr-xr-x 1 nobody users 598 Sep 24 12:07 osch-tanel-00003	
...	
-rwxr-xr-x 1 nobody users 598 Sep 24 12:07 osch-tanel-00099	
	
100+files,+allowing+up'to+100+
HDFS+streams+in+parallel.+
+
With+less+PX+slaves,+each+
slave+can+access+mulOple+
files+sequenOally.++
www.enkitec.com++ 15+++
BDA+P>+Exadata+X3P2+(16core/32thread)+1TB+data+load:+
500P600+MB/s+load+by+single+
DB+node+(1P2+TB+hour)++
www.enkitec.com++ 16+++
BDA+P>+Exadata+X3P2+(16core/32thread)+1TB+data+load:+
Skewed/Unbalanced+parallel+
execuOon:+4+slaves+work+for+
longer+when+others+are+
done+(4+x+32+++4+=+100+files)+
www.enkitec.com++ 17+++
Hadoop+Cluster+CPUs+are+idle?!+
www.enkitec.com++ 18+++
Drilling+deeper+into+the+CPU+usage+
SQL> @ostackprof 788 0.1 100	
	
Below is the stack prefix common to all samples:	
------------------------------------------------------------------------	
Frame->function()	
------------------------------------------------------------------------	
# 49 ->main()	
.... some lines snipped .....	
# 11 ->pextproc()	
# 10 ->spefmccallstd()	
# 9 ->spefcpfa()	
# 8 ->qxxqFetch()	
# 7 ->kpxsFetch()	
# 6 ->kpxsFetchField()	
# 5 ->kpxsFetchDriver()	
.... some lines snipped .....	
# -#--------------------------------------------------------------------	
# - Num.Samples -> in call stack()	
# ----------------------------------------------------------------------	
35 ->kudmxfe()->kudmdtp()->lxoSchPat()	
25 ->kudmxfe()->kudmdtp()->lxmfwdx()	
23 ->kudmxfe()->kudmdtp()->	
4 ->kpxsDoConvert()->OCIDirPathColArrayToStream()->kpudpcs_colArrayToStream()-
>kpudpcsf_intColArrayToStream()	
3 ->kudmxfe()->lxmfwdx()	
3 ->kudmxfe()->kudmrn()->kudmrt()	
2 ->qerxtCBFetch()->qerxtProcessRows()->qeaeCn1Serial()	
2 ->qerxtCBFetch()->qerxtProcessRows()->klxprParseRow()	
1 ->OCIDirPathColArrayReset()	
83%+of+Ome+spent+in+
datatype+conversion+(kudm)+
++
60%+in+lx*+funcOons+–+string/
datatype+processing++
www.enkitec.com++ 19+++
Datatype'Conversion'is'CPU'hungry!!!'
You+can+offload+the+
"preprocessing+and+datatype+
conversion"+to+the+Hadoop+
cluster+CPUs+with+the+Oracle'
Loader'for'Hadoop!'
www.enkitec.com++ 20+++
Oracle+Loader+for+Hadoop+
Hadoop+Cluster+
HDFS+
MR+
job+
MR+
job+
MapReduce+
(+CPU+)+
Oracle+Database+
Storage+
MR+
job+
MR+
job+DB+Process+
I/O+
O/I+
With+OCI/DataPump+
it's+possible+to+
convert+data+to+
Oracle+naOve+format+
No+datatype+
conversion+needed+
HCC+compression?+
DB+Waits+
ContenBon?'
+
Array+insert+(JDBC)+
Direct+Path+Load+(OCI)+
Create+DataPump+file+
(load+via+ext+table)+
Already'preG
converted'data'is'
sent'to'Oracle'
www.enkitec.com++ 21+++
•  Source:(High(Performance(Connectors(for(Load(and(Access(of(Data(from(
Hadoop(to(Oracle(Database((
•  June+2012+
•  h.p://www.oracle.com/technetwork/bdc/hadoopPloader/connectorsPhdfsP
wpP1674035.pdf+
Based+on+earlier+tests,+
these+numbers+are+
plausible.+(although+your+
mileage+will+vary+
depending+on+the+data+
you+convert+and+load)+
www.enkitec.com++ 22+++
Oracle+Loader+for+Hadoop+
•  Can+preprocess+and+convert+datatypes+to+Oracle+"naOve"+
format+using+Hadoop+cluster's+CPU+cycles+
•  DataPump+format+
•  OCI+Direct+Path+load+format+
•  Each+Reducer+in+Hadoop+connects+to+Oracle+DB+with+a+
separate+session+(OCI/JDBC)+
•  So+OCI+direct+path+loads+must+be+done+into+parOOoned+tables!+
•  Otherwise+you'll+get+TM+enqueue+contenOon+
•  Oracle+Loader+takes+care+of+the+distribuOon+
•  As+long+as+you+have+enough+reducers+configured+
www.enkitec.com++ 23+++
References+
OTN+Big+Data+Connectors+page+
•  h.p://www.oracle.com/technetwork/bdc/bigPdataPconnectors/
overview/index.html+
Oracle+Big+Data+Connectors+User's+Guide+
•  h.p://docs.oracle.com/cd/E41604_01/doc.22/e41238/toc.htm+
•  Tools+
•  dstat+
•  h.p://dag.wieers.com/homePmade/dstat/+
•  SwingBench+CPU+Monitor+
•  h.p://www.dominicgiles.com/cpumonitor.html+
+
+
www.enkitec.com++ 24+++
Thanks!!!+
•  QuesOons?+
•  Ask+now+:)+
•  Or+Contact+
•  tanel@tanelpoder.com+
•  h.p://blog.tanelpoder.com+
•  @tanelpoder+
+
•  h.p://www.enkitec.com+
•  We+rock!+;P)+

More Related Content

PPTX
Hadoop databases for oracle DBAs
PDF
Connecting Hadoop and Oracle
PDF
Tanel Poder - Performance stories from Exadata Migrations
PDF
Oracle 12.2 sharded database management
PDF
In Memory Database In Action by Tanel Poder and Kerry Osborne
PDF
Oracle Exadata Performance: Latest Improvements and Less Known Features
PPTX
Tanel Poder Oracle Scripts and Tools (2010)
PDF
Habits of Effective Sqoop Users
Hadoop databases for oracle DBAs
Connecting Hadoop and Oracle
Tanel Poder - Performance stories from Exadata Migrations
Oracle 12.2 sharded database management
In Memory Database In Action by Tanel Poder and Kerry Osborne
Oracle Exadata Performance: Latest Improvements and Less Known Features
Tanel Poder Oracle Scripts and Tools (2010)
Habits of Effective Sqoop Users

What's hot (19)

PPTX
Hadoop For Enterprises
PPTX
Oracle sharding : Installation & Configuration
PDF
Optimizing Hive Queries
PDF
Oracle 12.2 sharding learning more
PDF
Hw09 Sqoop Database Import For Hadoop
PPTX
Hive: Loading Data
PDF
MySQL 5.7: Core Server Changes
PPTX
Inside sql server in memory oltp sql sat nyc 2017
PDF
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
PDF
Apache Sqoop: Unlocking Hadoop for Your Relational Database
PDF
Presentations from the Cloudera Impala meetup on Aug 20 2013
PDF
Building large scale transactional data lake using apache hudi
PDF
Cloudera Impala, updated for v1.0
PDF
Optimizing Hive Queries
PDF
Hive Data Modeling and Query Optimization
PDF
GNW01: In-Memory Processing for Databases
PDF
MySQL Query Optimization
PPTX
Cloudera Impala: A Modern SQL Engine for Hadoop
PDF
Real-time Big Data Analytics Engine using Impala
Hadoop For Enterprises
Oracle sharding : Installation & Configuration
Optimizing Hive Queries
Oracle 12.2 sharding learning more
Hw09 Sqoop Database Import For Hadoop
Hive: Loading Data
MySQL 5.7: Core Server Changes
Inside sql server in memory oltp sql sat nyc 2017
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Apache Sqoop: Unlocking Hadoop for Your Relational Database
Presentations from the Cloudera Impala meetup on Aug 20 2013
Building large scale transactional data lake using apache hudi
Cloudera Impala, updated for v1.0
Optimizing Hive Queries
Hive Data Modeling and Query Optimization
GNW01: In-Memory Processing for Databases
MySQL Query Optimization
Cloudera Impala: A Modern SQL Engine for Hadoop
Real-time Big Data Analytics Engine using Impala
Ad

Similar to Moving Data Between Exadata and Hadoop (20)

PDF
[OSC 2020 Online/Nagoya] MySQLドキュメントストア
PDF
MySQL 8.0.17 - New Features Summary
PDF
Apache Spark v3.0.0
PDF
jQuery Makes Writing JavaScript Fun Again (for HTML5 User Group)
PDF
Curso de MySQL 5.7
PDF
Performance Optimization and JavaScript Best Practices
PDF
node-crate: node.js and big data
PPTX
MySQL Without the SQL -- Oh My! Longhorn PHP Conference
PDF
Talavant Data Lake Analytics
PDF
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
PDF
From zero to hero - Easy log centralization with Logstash and Elasticsearch
ODP
Beyond php - it's not (just) about the code
PDF
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODI
ODP
Beyond php - it's not (just) about the code
PDF
Manual Tecnico OGG Oracle to MySQL
PPTX
Html5 101
PDF
Rapid Prototyping with Solr
PPT
Praktik Pengembangan Konten E-Learning HTML5 Sederhana
PDF
Rapid Prototyping with Solr
PDF
Beyond php - it's not (just) about the code
[OSC 2020 Online/Nagoya] MySQLドキュメントストア
MySQL 8.0.17 - New Features Summary
Apache Spark v3.0.0
jQuery Makes Writing JavaScript Fun Again (for HTML5 User Group)
Curso de MySQL 5.7
Performance Optimization and JavaScript Best Practices
node-crate: node.js and big data
MySQL Without the SQL -- Oh My! Longhorn PHP Conference
Talavant Data Lake Analytics
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
From zero to hero - Easy log centralization with Logstash and Elasticsearch
Beyond php - it's not (just) about the code
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODI
Beyond php - it's not (just) about the code
Manual Tecnico OGG Oracle to MySQL
Html5 101
Rapid Prototyping with Solr
Praktik Pengembangan Konten E-Learning HTML5 Sederhana
Rapid Prototyping with Solr
Beyond php - it's not (just) about the code
Ad

More from Enkitec (20)

PDF
Using Angular JS in APEX
PDF
Controlling execution plans 2014
PDF
Engineered Systems: Environment-as-a-Service Demonstration
PDF
Think Exa!
PDF
In Search of Plan Stability - Part 1
PDF
Mini Session - Using GDB for Profiling
PDF
Profiling Oracle with GDB
PDF
Oracle Performance Tools of the Trade
PDF
Oracle Performance Tuning Fundamentals
PDF
SQL Tuning Tools of the Trade
PDF
Using SQL Plan Management (SPM) to Balance Plan Flexibility and Plan Stability
PDF
Oracle GoldenGate Architecture Performance
PDF
OGG Architecture Performance
PDF
APEX Security Primer
PDF
How Many Ways Can I Manage Oracle GoldenGate?
PDF
Understanding how is that adaptive cursor sharing (acs) produces multiple opt...
PDF
Sql tuning made easier with sqltxplain (sqlt)
PDF
Profiling the logwriter and database writer
PDF
Fatkulin hotsos 2014
PDF
Combining ACS Flexibility with SPM Stability
Using Angular JS in APEX
Controlling execution plans 2014
Engineered Systems: Environment-as-a-Service Demonstration
Think Exa!
In Search of Plan Stability - Part 1
Mini Session - Using GDB for Profiling
Profiling Oracle with GDB
Oracle Performance Tools of the Trade
Oracle Performance Tuning Fundamentals
SQL Tuning Tools of the Trade
Using SQL Plan Management (SPM) to Balance Plan Flexibility and Plan Stability
Oracle GoldenGate Architecture Performance
OGG Architecture Performance
APEX Security Primer
How Many Ways Can I Manage Oracle GoldenGate?
Understanding how is that adaptive cursor sharing (acs) produces multiple opt...
Sql tuning made easier with sqltxplain (sqlt)
Profiling the logwriter and database writer
Fatkulin hotsos 2014
Combining ACS Flexibility with SPM Stability

Recently uploaded (20)

PDF
Electronic commerce courselecture one. Pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
KodekX | Application Modernization Development
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPT
Teaching material agriculture food technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
A Presentation on Artificial Intelligence
PDF
Approach and Philosophy of On baking technology
Electronic commerce courselecture one. Pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
NewMind AI Monthly Chronicles - July 2025
KodekX | Application Modernization Development
Understanding_Digital_Forensics_Presentation.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
The AUB Centre for AI in Media Proposal.docx
Encapsulation_ Review paper, used for researhc scholars
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Teaching material agriculture food technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Digital-Transformation-Roadmap-for-Companies.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
A Presentation on Artificial Intelligence
Approach and Philosophy of On baking technology

Moving Data Between Exadata and Hadoop