SlideShare a Scribd company logo
Habits of Effective Sqoop Users
Kate Ting, Customer Operations Engineer
kate@cloudera.com
Halp! Sqoop doesn't work!

    Now what?


2
Agenda

    •  First Things First
    •  Common Problems
    •  MySQL
      –  Connection Failure
      –  Importing into Hive
    •  Oracle
      –  Case-Sensitive Catalog Query Errors
      –  Sqoop Export Failing
    •  Effective Sqoop Habits

3
Agenda

    •  First Things First
    •  Common Problems
    •  MySQL
      –  Connection Failure
      –  Importing into Hive
    •  Oracle
      –  Case-Sensitive Catalog Query Errors
      –  Sqoop Export Failing
    •  Effective Sqoop Habits

4
First Things First
    Save time by providing this upfront:
    •  Versions: Sqoop, Hadoop, OS, JDBC
    •  Run with --verbose flag then attach log
    •  Sqoop command including options-file
    •  Expected output vs. actual output
    •  Table definition
    •  Input data set that triggers problem
    •  Hadoop task logs
    •  Check permissions on input files
    •  Divide and conquer
       –  e.g. Import that creates and populates a Hive table is failing
           •  First, do the import alone
           •  Second, create a Hive table without the import using the create-hive-
              table tool



5
Common Problems


6
Agenda

    •  First Things First
    •  Common Problems
    •  MySQL
      –  Connection Failure
      –  Importing into Hive
    •  Oracle
      –  Case-Sensitive Catalog Query Errors
      –  Sqoop Export Failing
    •  Effective Sqoop Habits

7
MySQL: Connection Failure
    java.lang.RuntimeException: java.lang.RuntimeException:
    com.mysql.jdbc.exceptions.jdbc4 .CommunicationsException: Communications link failure

    The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received
    any packets from the server.
    at com.cloudera.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:164 )
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:606)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja va:1127)
    at org.apache.hadoop.mapred.Child.main(Child.java:264)




8
MySQL: Connection Failure
    •    Problem: Communications Link Failure caused by incorrect permissions.
    •    Solution:
          –  Verify that you can connect to the database from the node where you
             are running Sqoop:
               •  $ mysql --host=<IP Address> --database=test --user=<username> --
                  password=<password>
          –  Add the network port for the server to your my.cnf file
          –  Set up a user account to connect via Sqoop. Grant permissions to the
             user to access the database over the network:
               •  Log into MySQL as root mysql -u root -p<ThisIsMyPassword>
               •  Issue the following command: mysql> grant all privileges on test.* to
                  'testuser'@'%' identified by 'testpassword'




9
MySQL: Importing into Hive
 •  Troubleshooting tips:
     –  Look at /tmp/${user}/hive.log
        •  Identifies exceptions during the load
     –  Look at /user/hive/warehouse
        •  View contents of the imported data




10
Agenda

 •  First Things First
 •  Common Problems
 •  MySQL
     –  Connection Failure
     –  Importing into Hive
 •  Oracle
     –  Case-Sensitive Catalog Query Errors
     –  Sqoop Export Failing
 •  Effective Sqoop Habits

11
Oracle: Case-Sensitive Catalog Query Errors
 INFO manager.OracleManager: Time zone has been set to GMT
 DEBUG manager.SqlManager: Using fetchSize for next query: 1000
 INFO manager.SqlManager: Executing SQL statement:
 SELECT t.* FROM addlabel_pris t WHERE 1=0
 DEBUG manager.OracleManager$ConnCache: Caching
 released connection for jdbc:oracle:thin:
 ERROR sqoop.Sqoop: Got exception running Sqoop:
 java.lang.NullPointerException
 java.lang.NullPointerException
 at com.cloudera.sqoop.hive.TableDefWriter.getCreateTableStmt(TableDefWriter.java:148)
 at com.cloudera.sqoop.hive.HiveImport.importTable(HiveImport.java:187)
 at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:362)
 at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:423)
 at com.cloudera.sqoop.Sqoop.run(Sqoop.java:144)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:180)
 at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:219)
 at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:228)
 at com.cloudera.sqoop.Sqoop.main(Sqoop.java:237)




12
Oracle: Case-Sensitive Catalog Query Errors

 •  Problem: NPE caused by using the wrong
    case for the user name and table name.
 •  Solution: Always specify the user and table
    names in upper case (unless it was
    created with mixed/lower case within
    quotes).




13
Oracle: Sqoop Export Failing
 INFO mapred.JobClient: Running job: job_201109231340_0785
 INFO mapred.JobClient: map 0% reduce 0%
 INFO mapred.JobClient: Task Id :
 attempt_201109231340_0785_m_000000_0, Status : FAILED
 java.lang.NullPointerException
 at
 com.cloudera.sqoop.mapreduce.db.DataDrivenDBRecordReader.getSe
 lectQuery(DataDrivenDBRecordReader.java:87)
 at com.cloudera.sqoop.mapreduce.db.DBRecordReader.nextKeyValue
 (DBRecordReader.java:225)
 at org.apache.hadoop.mapred.MapTask
 $NewTrackingRecordReader.nextKeyValue(MapTask.java:455)
 at org.apache.hadoop.mapreduce.MapContext.nextKeyValue
 (MapContext.java:67)




14
Oracle: Sqoop Export Failing
 •  Problem: IllegalArgumentException
    caused by not non-owner trying to connect
    to the table.
 •  Solution: Prefix the table name with the
    schema, for example
    SchemaName.OracleTableName.




15
Agenda

 •  First Things First
 •  Common Problems
 •  MySQL
     –  Connection Failure
     –  Importing into Hive
 •  Oracle
     –  Case-Sensitive Catalog Query Errors
     –  Sqoop Export Failing
 •  Effective Sqoop Habits

16
Effective Sqoop Habits

 •  Do create an empty export table.

 •  Don’t use the same table for both import
    and export.




17
Effective Sqoop Habits

 •  Do use --escaped-by option during import
    and --input-escaped-by during export.
 •  Do use fields-terminated-by during import
    and input-fields-terminated-by during
    export.

 •  Don’t reverse them.



18
Effective Sqoop Habits

 •  Do specify the direct mode option (--
    direct), if you use the direct connector.

 •  Don’t specify the query, if you use the
    direct connector.




19
How Do You Eat an Elephant?

 •  One bite at a time
     –  Versions
     –  Verbose flag
     –  Console log
     –  Exact command, etc

 •  Sqoop Troubleshooting Guide
     –  http://archive.cloudera.com/cdh/3/sqoop/
        SqoopUserGuide.html#_troubleshooting


20

More Related Content

PDF
Apache Sqoop: Unlocking Hadoop for Your Relational Database
PPTX
From oracle to hadoop with Sqoop and other tools
PPTX
Advanced Sqoop
PPTX
HiveServer2
PPTX
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
PDF
SQL to Hive Cheat Sheet
PPTX
Hadoop on osx
PDF
Hive dirty/beautiful hacks in TD
Apache Sqoop: Unlocking Hadoop for Your Relational Database
From oracle to hadoop with Sqoop and other tools
Advanced Sqoop
HiveServer2
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
SQL to Hive Cheat Sheet
Hadoop on osx
Hive dirty/beautiful hacks in TD

What's hot (19)

PDF
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
PDF
Why your Spark job is failing
PPT
11. From Hadoop to Spark 2/2
PPTX
Data analysis scala_spark
PDF
Moving Data Between Exadata and Hadoop
PDF
Data Analytics Service Company and Its Ruby Usage
PDF
DataEngConf SF16 - Collecting and Moving Data at Scale
PDF
Spark zeppelin-cassandra at synchrotron
PDF
Apache zeppelin the missing component for the big data ecosystem
PDF
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
PDF
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
PPTX
Why your Spark Job is Failing
PDF
Tanel Poder - Performance stories from Exadata Migrations
PDF
Introduction to Apache Hive
PDF
Using Morphlines for On-the-Fly ETL
PDF
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
PPTX
Tanel Poder Oracle Scripts and Tools (2010)
PPTX
How to build your query engine in spark
PDF
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Why your Spark job is failing
11. From Hadoop to Spark 2/2
Data analysis scala_spark
Moving Data Between Exadata and Hadoop
Data Analytics Service Company and Its Ruby Usage
DataEngConf SF16 - Collecting and Moving Data at Scale
Spark zeppelin-cassandra at synchrotron
Apache zeppelin the missing component for the big data ecosystem
Introduction to Oozie | Big Data Hadoop Spark Tutorial | CloudxLab
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Why your Spark Job is Failing
Tanel Poder - Performance stories from Exadata Migrations
Introduction to Apache Hive
Using Morphlines for On-the-Fly ETL
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Tanel Poder Oracle Scripts and Tools (2010)
How to build your query engine in spark
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
Ad

Viewers also liked (10)

PPTX
Apache sqoop with an use case
PDF
New Data Transfer Tools for Hadoop: Sqoop 2
PDF
The Transporter 4 MySQL
PDF
Apache Sqoop: A Data Transfer Tool for Hadoop
PPTX
Hadoop and rdbms with sqoop
PPTX
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
PPTX
Big data components - Introduction to Flume, Pig and Sqoop
PPTX
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
PPTX
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
PDF
Sqoop on Spark for Data Ingestion
Apache sqoop with an use case
New Data Transfer Tools for Hadoop: Sqoop 2
The Transporter 4 MySQL
Apache Sqoop: A Data Transfer Tool for Hadoop
Hadoop and rdbms with sqoop
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Big data components - Introduction to Flume, Pig and Sqoop
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Sqoop on Spark for Data Ingestion
Ad

Similar to Habits of Effective Sqoop Users (20)

PDF
Getting started with Riak in the Cloud
PDF
Node object and roles - Fundamentals Webinar Series Part 3
PDF
Building an Impenetrable ZooKeeper - Kathleen Ting
KEY
Writing Better Haskell
PDF
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
PPTX
Capacity Management/Provisioning (Cloud's full, Can't build here)
PDF
Building Out Your Kafka Developer CDC Ecosystem
PDF
Velocity 2011 Chef OpenStack Workshop
PPTX
BTV PHP - Building Fast Websites
PDF
Workflow Engines for Hadoop
PDF
Host Health Monitoring with Docker Run
PDF
Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...
PDF
DrupalSouth 2015 - Performance: Not an Afterthought
PDF
Kubernetes Walk Through from Technical View
PDF
Breaking the Monolith - Microservice Extraction at SoundCloud
PDF
TryStack: A Sandbox for OpenStack Users and Admins
PDF
Avoid boring work_v2
PDF
Troubleshooting Hadoop: Distributed Debugging
PPTX
Building Spark as Service in Cloud
PDF
MariaDB: in-depth (hands on training in Seoul)
Getting started with Riak in the Cloud
Node object and roles - Fundamentals Webinar Series Part 3
Building an Impenetrable ZooKeeper - Kathleen Ting
Writing Better Haskell
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Capacity Management/Provisioning (Cloud's full, Can't build here)
Building Out Your Kafka Developer CDC Ecosystem
Velocity 2011 Chef OpenStack Workshop
BTV PHP - Building Fast Websites
Workflow Engines for Hadoop
Host Health Monitoring with Docker Run
Ruby on-rails-101-presentation-slides-for-a-five-day-introductory-course-1194...
DrupalSouth 2015 - Performance: Not an Afterthought
Kubernetes Walk Through from Technical View
Breaking the Monolith - Microservice Extraction at SoundCloud
TryStack: A Sandbox for OpenStack Users and Admins
Avoid boring work_v2
Troubleshooting Hadoop: Distributed Debugging
Building Spark as Service in Cloud
MariaDB: in-depth (hands on training in Seoul)

Recently uploaded (20)

PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Electronic commerce courselecture one. Pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
KodekX | Application Modernization Development
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
Big Data Technologies - Introduction.pptx
PDF
cuic standard and advanced reporting.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Encapsulation theory and applications.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Approach and Philosophy of On baking technology
PDF
Modernizing your data center with Dell and AMD
Agricultural_Statistics_at_a_Glance_2022_0.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Electronic commerce courselecture one. Pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Per capita expenditure prediction using model stacking based on satellite ima...
KodekX | Application Modernization Development
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Big Data Technologies - Introduction.pptx
cuic standard and advanced reporting.pdf
MYSQL Presentation for SQL database connectivity
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Encapsulation theory and applications.pdf
Network Security Unit 5.pdf for BCA BBA.
Approach and Philosophy of On baking technology
Modernizing your data center with Dell and AMD

Habits of Effective Sqoop Users

  • 1. Habits of Effective Sqoop Users Kate Ting, Customer Operations Engineer kate@cloudera.com
  • 2. Halp! Sqoop doesn't work! Now what? 2
  • 3. Agenda •  First Things First •  Common Problems •  MySQL –  Connection Failure –  Importing into Hive •  Oracle –  Case-Sensitive Catalog Query Errors –  Sqoop Export Failing •  Effective Sqoop Habits 3
  • 4. Agenda •  First Things First •  Common Problems •  MySQL –  Connection Failure –  Importing into Hive •  Oracle –  Case-Sensitive Catalog Query Errors –  Sqoop Export Failing •  Effective Sqoop Habits 4
  • 5. First Things First Save time by providing this upfront: •  Versions: Sqoop, Hadoop, OS, JDBC •  Run with --verbose flag then attach log •  Sqoop command including options-file •  Expected output vs. actual output •  Table definition •  Input data set that triggers problem •  Hadoop task logs •  Check permissions on input files •  Divide and conquer –  e.g. Import that creates and populates a Hive table is failing •  First, do the import alone •  Second, create a Hive table without the import using the create-hive- table tool 5
  • 7. Agenda •  First Things First •  Common Problems •  MySQL –  Connection Failure –  Importing into Hive •  Oracle –  Case-Sensitive Catalog Query Errors –  Sqoop Export Failing •  Effective Sqoop Habits 7
  • 8. MySQL: Connection Failure java.lang.RuntimeException: java.lang.RuntimeException: com.mysql.jdbc.exceptions.jdbc4 .CommunicationsException: Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. at com.cloudera.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:164 ) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:606) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja va:1127) at org.apache.hadoop.mapred.Child.main(Child.java:264) 8
  • 9. MySQL: Connection Failure •  Problem: Communications Link Failure caused by incorrect permissions. •  Solution: –  Verify that you can connect to the database from the node where you are running Sqoop: •  $ mysql --host=<IP Address> --database=test --user=<username> -- password=<password> –  Add the network port for the server to your my.cnf file –  Set up a user account to connect via Sqoop. Grant permissions to the user to access the database over the network: •  Log into MySQL as root mysql -u root -p<ThisIsMyPassword> •  Issue the following command: mysql> grant all privileges on test.* to 'testuser'@'%' identified by 'testpassword' 9
  • 10. MySQL: Importing into Hive •  Troubleshooting tips: –  Look at /tmp/${user}/hive.log •  Identifies exceptions during the load –  Look at /user/hive/warehouse •  View contents of the imported data 10
  • 11. Agenda •  First Things First •  Common Problems •  MySQL –  Connection Failure –  Importing into Hive •  Oracle –  Case-Sensitive Catalog Query Errors –  Sqoop Export Failing •  Effective Sqoop Habits 11
  • 12. Oracle: Case-Sensitive Catalog Query Errors INFO manager.OracleManager: Time zone has been set to GMT DEBUG manager.SqlManager: Using fetchSize for next query: 1000 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM addlabel_pris t WHERE 1=0 DEBUG manager.OracleManager$ConnCache: Caching released connection for jdbc:oracle:thin: ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException java.lang.NullPointerException at com.cloudera.sqoop.hive.TableDefWriter.getCreateTableStmt(TableDefWriter.java:148) at com.cloudera.sqoop.hive.HiveImport.importTable(HiveImport.java:187) at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:362) at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:423) at com.cloudera.sqoop.Sqoop.run(Sqoop.java:144) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:180) at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:219) at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:228) at com.cloudera.sqoop.Sqoop.main(Sqoop.java:237) 12
  • 13. Oracle: Case-Sensitive Catalog Query Errors •  Problem: NPE caused by using the wrong case for the user name and table name. •  Solution: Always specify the user and table names in upper case (unless it was created with mixed/lower case within quotes). 13
  • 14. Oracle: Sqoop Export Failing INFO mapred.JobClient: Running job: job_201109231340_0785 INFO mapred.JobClient: map 0% reduce 0% INFO mapred.JobClient: Task Id : attempt_201109231340_0785_m_000000_0, Status : FAILED java.lang.NullPointerException at com.cloudera.sqoop.mapreduce.db.DataDrivenDBRecordReader.getSe lectQuery(DataDrivenDBRecordReader.java:87) at com.cloudera.sqoop.mapreduce.db.DBRecordReader.nextKeyValue (DBRecordReader.java:225) at org.apache.hadoop.mapred.MapTask $NewTrackingRecordReader.nextKeyValue(MapTask.java:455) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue (MapContext.java:67) 14
  • 15. Oracle: Sqoop Export Failing •  Problem: IllegalArgumentException caused by not non-owner trying to connect to the table. •  Solution: Prefix the table name with the schema, for example SchemaName.OracleTableName. 15
  • 16. Agenda •  First Things First •  Common Problems •  MySQL –  Connection Failure –  Importing into Hive •  Oracle –  Case-Sensitive Catalog Query Errors –  Sqoop Export Failing •  Effective Sqoop Habits 16
  • 17. Effective Sqoop Habits •  Do create an empty export table. •  Don’t use the same table for both import and export. 17
  • 18. Effective Sqoop Habits •  Do use --escaped-by option during import and --input-escaped-by during export. •  Do use fields-terminated-by during import and input-fields-terminated-by during export. •  Don’t reverse them. 18
  • 19. Effective Sqoop Habits •  Do specify the direct mode option (-- direct), if you use the direct connector. •  Don’t specify the query, if you use the direct connector. 19
  • 20. How Do You Eat an Elephant? •  One bite at a time –  Versions –  Verbose flag –  Console log –  Exact command, etc •  Sqoop Troubleshooting Guide –  http://archive.cloudera.com/cdh/3/sqoop/ SqoopUserGuide.html#_troubleshooting 20

Editor's Notes

  • #2: Sqoopdoes not guarantee intuitive error messages. But I guarantee that in the next ten minutes you will either learn or be reminded of a fewtips to make your next debugging session more effective.
  • #20: Specifying the query bypasses the direct connector