© Cloudera, Inc. All rights reserved.
Road to Cloudera certification
© Cloudera, Inc. All rights reserved.
The demand for skills is high and Hadoop is the future. Customers
cannot afford to move slowly in staffing their Big Data projects.
Customers are building plans to ensure projects are staffed with
skilled employees, and supported by a qualified services provider.
Job Trends from Indeed.com
What are you most concerned about
when it comes to your readiness for big
data and Hadoop?
Cloudera MDP webinar poll results, July 2016
© Cloudera, Inc. All rights reserved.
Why Cloudera training?
Aligned to best practices and the pace of change
1 Broadest range of courses
Learning paths for Developer, Admin, Analyst
2 Most experienced instructors
More than 50,000 trained since 2009
6 Widest geographic coverage
Most classes offered: 50 cities worldwide plus online
7 Most relevant platform & community
CDH deployed more than all other distributions combined
3 Leader in certification
Over 12,000 accredited Cloudera professionals
Trusted source for training
100,000+ people have attended online courses4
8 Depth of training material
Hands-on labs and VMs support live instruction
9 Ongoing learning
Video tutorials and e-learning complement training
State of the art curriculum
Courses updated as Hadoop evolves5 10Commitment to big data education
University partnerships to teach Hadoop in colleges
© Cloudera, Inc. All rights reserved.
What is available from Cloudera University?
• Private training: Course delivered at location of customer choice to internal audience
• Public training: Courses regularly scheduled around the globe. Schedule available on web
• Virtual training: Live training accessed via the internet; available for public and private courses
• OnDemand training: Pre-recorded lecture with identical content/exercises as live training options
• Certification: Rigorously developed and meaningful bodies of knowledge
OnDemand Virtual live classroom Private onsitePublic live classroom
© Cloudera, Inc. All rights reserved.
Suggested Cloudera University curricula
Developers
• Python/Scala Training
• Developer for Spark and Hadoop
• CCA: Spark and Hadoop
Developer
• Spark ML & Kafka modules
• Topic specific training (Search,
HBase)
• Hands on practice
• CCP: Data Engineer
Administrators
• Cloudera Administration training
• CCA: Administrator
Data Analysts/Data Scientists
• Data Analyst: Using Hive, Pig & Impala
• CCA: Data Analyst
• Cloudera Data Science
© Cloudera, Inc. All rights reserved.
Let’s get certified!
© Cloudera, Inc. All rights reserved.
Certification Tiers
 CCA (Cloudera Certified Associate)
 Data Analyst, Admin and Spark & Hadoop Developer
 Basic exam – but its a complex subject area
 Maps to curriculum
 CCP (Cloudera Certified Professional)
 Data Engineer
 Combination of Developer, Analyst and Big Data services
 Mastery level – beyond the introduction course
 Real world experience
© Cloudera, Inc. All rights reserved.
Exam format CCA and CCP certification
 Not multiple choice
 Hands on, practical exams similar to student exercises
 Home based, no testing centres
 Proctored through ExamsLocal.com
 Webcam and desktop recorded and monitored
 No papers / phone / drinks on desk / no talking
 AWS Cloud-based cluster
 Guacamole remote desktop in web browser
 No Internet search during exam – only local documentation
© Cloudera, Inc. All rights reserved.
Sample CCA question
 Instructions
 Connect to the MySQL database on the cluster using Sqoop and import all of the
data from the customer table into HDFS. The result must be comma delimited
text format and put into hdfs dir /user/cert/solution3
 Data Description
 A MySQL instance is running on the gateway node. In that instance, you will find
a table that contains twenty-five million (25,000,000) rows of customer data.
MySQL database information:
Installation: On the cluster node gateway
Table name: customer
Username: cloudera
Password: cloudera
© Cloudera, Inc. All rights reserved.
Sample CCP Data Engineer question #1
Instructions
 Dualcore Inc. is a leading electronics retailer. All of their customer data is in a
relational database. Your task is to ingest all this data into their Hadoop
cluster in the proper file format and compression for their needs.
 Dualcore has a number of requirements for this data. It must be stored in a
binary file format. They will keep this data for a minimum of ten years, so
select a format that supports access from multiple programming languages
and backward compatibility if the schema ever changes. They also require
that the data be stored in a compressed format. The data is queried
regularly, so choose a compression codec that is fastest for compression and
decompression and included with CDH.
Data Description ...
© Cloudera, Inc. All rights reserved.
Sample CCP Data Engineer question #2
Instructions
LoudAcre Mobile is a mobile phone service provider that is moving a portion of their
customer analytics workload to Hadoop. Before they can use their customer data,
they want you to clean it and make it consistent.
Errors were found while looking at the customer records. Unfortunately, different input
methods wrote date fields in different formats. Your task is to standardize these
date fields into a consistent format..
Data Description ...
1943233 Chrisopher Rodrigez Jan 11, 1980
8989022 John Birchall 6/7/1967
2933321 Thomas Stewart 08/22/54
© Cloudera, Inc. All rights reserved.
How to Study for CCA and CCP certification
 Set aside 2 to 3 days of dedicated study time for certification
 These certification tests are not easy
 Review the certification webpage study points
 Only study using the certification open book linked documentation
 No Google, Cloudera Training material, favourite tutorial
 Practice with CDH and spark software versions found in the test
 Be familiar with Hive, Imapla shell, Basic Linux shell and Hue UI
© Cloudera, Inc. All rights reserved.
Practice all of the study points
 Stop when confident you know the topic by practising it
 Ensure your know the syntax and experienced the gotchas
 Read all the documentation concerned with the study topic
 Know the documented examples for your copy/paste go to
 Know where to lookup parameters, config and api docs
 Be able to adapt to different scenarios or link topics together
 Questions have multi parts and dependencies
© Cloudera, Inc. All rights reserved.
Taking the exam
 CCA Data Analyst and Developer 2 Hours 9 Questions - 13 mins per
question
 CCA Admin 2 hours 10 questions - 12 mins per question
 CCP Engineer 4 hours 7 questions - 34 mins per question
 Some questions are done in 5 mins some take 20+ or 45+ mins per question
 Questions are weighted in value and can have multiple parts
 Risk of a running out of time which means
 Can’t complete the easy questions to pass
 Can’t check your answers to fix any problems to pass
 Stop any question after 20 mins and come back at the end
 Skip any question that looks too hard after quick skim read and come
back
 Finished? Always double check your answers
© Cloudera, Inc. All rights reserved.
Common certification exam problems
 Review the certification FAQ for common problems and questions marked wrong
status
 https://www.cloudera.com/more/training/certification/faq.html
 Remote desktop or network too slow!
 Do exam off peak times. Use command line shell not Hue gui.
 Unfamiliar with the questions topic. Time wasted reading docs in exam time. Study!
 Don’t use localhost instead use the correct gateway/master/worker hostname
 Rushing and stressed makes mistakes:
 Misinterpreted what the question asked.
 Are directories/files/property/columns names spelled correctly?
 Is output data format 100% correct ? check column order, data types, null values
are what was asked. Don’t assume.
 Notice any errors in logs or console when running ? Scroll back and check!
© Cloudera, Inc. All rights reserved.
Tips for studying CCA Admin
 Know Cloudera Manager UI and how to search properties
 Breadcrumbs, instances, safety valve advanced settings
 Forget to apply setting or restart service, don’t break the cluster!
 Practice topics not in the admin course but in the exam:
 Sentry setup, Load balancer, Log redaction and Encrypted zones
 Practice all the hdfs dfs and dfsadmin commands
 Practice setting up services and service instances
 Practice troubleshooting and fixing common problem applications
 Know your way around the different log files
© Cloudera, Inc. All rights reserved.
Tips for studying Data Analyst certification
 Study how to use regex to manipulate strings well
 SQL subqueries have a temp table name, don’t forget it
 Understand Sqoop warehouse dir and target dir relationship
 Practice Sqoop help to quickly view and use parameters
 Practice window analytic functions - not easy to do
 Practice type conversions for Hive and Impala
 Practice how to create partitioned/bucketed tables – lots of syntax
 Copy and paste directly from the question to quickly create the table
 Practice using the command line: beeline and impala shell
© Cloudera, Inc. All rights reserved.
Tips for studying CCA Spark and Hadoop
 No need to be an expert in Scala or Python coding.
 Only testing Spark knowledge.
 Practice Sqoop, Hdfs dfs command line and your SQL
 Certification has not yet been updated to spark 2.0 (uses 1.6)
 New students may not be familiar with Spark 1.6. Minor differences.
 Read and practice using spark documentation
 Start the 1.6 spark shell with pyspark and spark-shell not spark2-shell or
pyspark2
© Cloudera, Inc. All rights reserved.
Tips for studying CCP Data Enginner
 Study non core topics found outside the training course material
 Ignore what is not Cloudera supported
 Oozie features one third of the test!
 See gethue.com website for short oozie ui tutorials
 How to get Oozie to run on your small default cluster:
 Adjust container memory so you can run multiple containers
 Increase Node manager max container size to 7 GB
 Limit container memory max size to 3 GB and 1 cpu
 Result on a dual core 8gb 3x worker node cluster: 6 containers.
 Currently Spark 1.6 not Spark 2.0 (will be updated in the future)
© Cloudera, Inc. All rights reserved.
Qualify for free certification
 Take part in a Data Analyst, Developer or Administrator Public class to
receive a free certification exam in the given discipline
 Valid till the end of April
© Cloudera, Inc. All rights reserved.
Thank you

More Related Content

PPTX
A deep dive into running data analytic workloads in the cloud
PPTX
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
PPTX
Unlock Hadoop Success with Cloudera Navigator Optimizer
PDF
How to build leakproof stream processing pipelines with Apache Kafka and Apac...
PPTX
Big Data Fundamentals
PDF
Data Science and Machine Learning for the Enterprise
PPTX
Spark One Platform Webinar
PPTX
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
A deep dive into running data analytic workloads in the cloud
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Unlock Hadoop Success with Cloudera Navigator Optimizer
How to build leakproof stream processing pipelines with Apache Kafka and Apac...
Big Data Fundamentals
Data Science and Machine Learning for the Enterprise
Spark One Platform Webinar
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...

What's hot (20)

PDF
Apache Hadoop 3
PDF
One Hadoop, Multiple Clouds - NYC Big Data Meetup
PPTX
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
PPTX
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
PPTX
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
PPTX
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
PPTX
Data Science and CDSW
PPTX
Part 1: Lambda Architectures: Simplified by Apache Kudu
PPTX
Extreme Sports & Beyond: Exploring a new frontier in data with GoPro
PDF
Cloudera Showcase: SQL-on-Hadoop
PPTX
Multi-Tenant Operations with Cloudera 5.7 & BT
PPTX
Data Science at Scale Using Apache Spark and Apache Hadoop
PDF
How to use Impala query plan and profile to fix performance issues
PPTX
Risk Management for Data: Secured and Governed
PDF
Hadoop on Cloud: Why and How?
PPTX
Intro to Apache Spark
PPTX
Solr consistency and recovery internals
PPTX
Security implementation on hadoop
PPTX
Five Tips for Running Cloudera on AWS
PPTX
Part 3: Models in Production: A Look From Beginning to End
Apache Hadoop 3
One Hadoop, Multiple Clouds - NYC Big Data Meetup
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Data Science and CDSW
Part 1: Lambda Architectures: Simplified by Apache Kudu
Extreme Sports & Beyond: Exploring a new frontier in data with GoPro
Cloudera Showcase: SQL-on-Hadoop
Multi-Tenant Operations with Cloudera 5.7 & BT
Data Science at Scale Using Apache Spark and Apache Hadoop
How to use Impala query plan and profile to fix performance issues
Risk Management for Data: Secured and Governed
Hadoop on Cloud: Why and How?
Intro to Apache Spark
Solr consistency and recovery internals
Security implementation on hadoop
Five Tips for Running Cloudera on AWS
Part 3: Models in Production: A Look From Beginning to End
Ad

Similar to Road to Cloudera certification (20)

PPTX
Cloudera training: secure your Cloudera cluster
PPTX
Part 2: A Visual Dive into Machine Learning and Deep Learning 

PDF
Data Engineering Course Syllabus - WeCloudData
PDF
Delivering Insights from 20M+ Smart Homes with 500M+ Devices
PDF
Best Practices For Workflow
PPTX
Analyzing Hadoop Data Using Sparklyr

PDF
DevOps and Decoys How to Build a Successful Microsoft DevOps Including the Data
PDF
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
PPTX
Introduction to Cloudera Search Training
PPTX
Software engineering practices for the data science and machine learning life...
PPTX
Kafka for DBAs
PPT
Hadoop applicationarchitectures
PDF
Databricks Partner Enablement Guide.pdf
PPTX
Large-Scale Data Science on Hadoop (Intel Big Data Day)
PDF
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
PDF
PySpark Best Practices
PDF
Hadoop and Mapreduce Certification
PDF
Cloudera data-analyst-training
PPTX
Aws certified: the journey with tips n tricks
PPTX
HadoopIntroduction.pptx
Cloudera training: secure your Cloudera cluster
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Data Engineering Course Syllabus - WeCloudData
Delivering Insights from 20M+ Smart Homes with 500M+ Devices
Best Practices For Workflow
Analyzing Hadoop Data Using Sparklyr

DevOps and Decoys How to Build a Successful Microsoft DevOps Including the Data
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
Introduction to Cloudera Search Training
Software engineering practices for the data science and machine learning life...
Kafka for DBAs
Hadoop applicationarchitectures
Databricks Partner Enablement Guide.pdf
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
PySpark Best Practices
Hadoop and Mapreduce Certification
Cloudera data-analyst-training
Aws certified: the journey with tips n tricks
HadoopIntroduction.pptx
Ad

More from Cloudera, Inc. (20)

PPTX
Partner Briefing_January 25 (FINAL).pptx
PPTX
Cloudera Data Impact Awards 2021 - Finalists
PPTX
2020 Cloudera Data Impact Awards Finalists
PPTX
Edc event vienna presentation 1 oct 2019
PPTX
Machine Learning with Limited Labeled Data 4/3/19
PPTX
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
PPTX
Introducing Cloudera DataFlow (CDF) 2.13.19
PPTX
Introducing Cloudera Data Science Workbench for HDP 2.12.19
PPTX
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
PPTX
Leveraging the cloud for analytics and machine learning 1.29.19
PPTX
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
PPTX
Leveraging the Cloud for Big Data Analytics 12.11.18
PPTX
Modern Data Warehouse Fundamentals Part 3
PPTX
Modern Data Warehouse Fundamentals Part 2
PPTX
Modern Data Warehouse Fundamentals Part 1
PPTX
Extending Cloudera SDX beyond the Platform
PPTX
Federated Learning: ML with Privacy on the Edge 11.15.18
PPTX
Analyst Webinar: Doing a 180 on Customer 360
PPTX
Build a modern platform for anti-money laundering 9.19.18
PPTX
Introducing the data science sandbox as a service 8.30.18
Partner Briefing_January 25 (FINAL).pptx
Cloudera Data Impact Awards 2021 - Finalists
2020 Cloudera Data Impact Awards Finalists
Edc event vienna presentation 1 oct 2019
Machine Learning with Limited Labeled Data 4/3/19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Leveraging the cloud for analytics and machine learning 1.29.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Leveraging the Cloud for Big Data Analytics 12.11.18
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 1
Extending Cloudera SDX beyond the Platform
Federated Learning: ML with Privacy on the Edge 11.15.18
Analyst Webinar: Doing a 180 on Customer 360
Build a modern platform for anti-money laundering 9.19.18
Introducing the data science sandbox as a service 8.30.18

Recently uploaded (20)

DOCX
Hand book of Entrepreneurship 4 Chapters.docx
PDF
533158074-Saudi-Arabia-Companies-List-Contact.pdf
PDF
Comments on Clouds that Assimilate Parts I&II.pdf
PPTX
interschool scomp.pptxzdkjhdjvdjvdjdhjhieij
PDF
Vinod Bhatt - Most Inspiring Supply Chain Leader in India 2025.pdf
PDF
Cross-Cultural Leadership Practices in Education (www.kiu.ac.ug)
PPTX
Transportation in Logistics management.pptx
PDF
Stacey L Stevens - Canada's Most Influential Women Lawyers Revolutionizing Th...
PDF
Kishore Vora - Best CFO in India to watch in 2025.pdf
PPTX
Understanding Procurement Strategies.pptx Your score increases as you pick a ...
PPTX
basic introduction to research chapter 1.pptx
PPTX
IMM.pptx marketing communication givguhfh thfyu
PDF
Engaging Stakeholders in Policy Discussions: A Legal Framework (www.kiu.ac.ug)
PDF
Second Hand Fashion Call to Action March 2025
PDF
MBA2024 CGE 1.pdf file presentation 2025
PPTX
df0ee68f89e1a869be4bff9b80a7 business 79f0.pptx
PDF
Communication Tactics in Legal Contexts: Historical Case Studies (www.kiu.ac...
PPTX
33ABJFA6556B1ZP researhchzfrsdfasdfsadzd
PPTX
2 - Self & Personality 587689213yiuedhwejbmansbeakjrk
PDF
Susan Semmelmann: Enriching the Lives of others through her Talents and Bless...
Hand book of Entrepreneurship 4 Chapters.docx
533158074-Saudi-Arabia-Companies-List-Contact.pdf
Comments on Clouds that Assimilate Parts I&II.pdf
interschool scomp.pptxzdkjhdjvdjvdjdhjhieij
Vinod Bhatt - Most Inspiring Supply Chain Leader in India 2025.pdf
Cross-Cultural Leadership Practices in Education (www.kiu.ac.ug)
Transportation in Logistics management.pptx
Stacey L Stevens - Canada's Most Influential Women Lawyers Revolutionizing Th...
Kishore Vora - Best CFO in India to watch in 2025.pdf
Understanding Procurement Strategies.pptx Your score increases as you pick a ...
basic introduction to research chapter 1.pptx
IMM.pptx marketing communication givguhfh thfyu
Engaging Stakeholders in Policy Discussions: A Legal Framework (www.kiu.ac.ug)
Second Hand Fashion Call to Action March 2025
MBA2024 CGE 1.pdf file presentation 2025
df0ee68f89e1a869be4bff9b80a7 business 79f0.pptx
Communication Tactics in Legal Contexts: Historical Case Studies (www.kiu.ac...
33ABJFA6556B1ZP researhchzfrsdfasdfsadzd
2 - Self & Personality 587689213yiuedhwejbmansbeakjrk
Susan Semmelmann: Enriching the Lives of others through her Talents and Bless...

Road to Cloudera certification

  • 1. © Cloudera, Inc. All rights reserved. Road to Cloudera certification
  • 2. © Cloudera, Inc. All rights reserved. The demand for skills is high and Hadoop is the future. Customers cannot afford to move slowly in staffing their Big Data projects. Customers are building plans to ensure projects are staffed with skilled employees, and supported by a qualified services provider. Job Trends from Indeed.com What are you most concerned about when it comes to your readiness for big data and Hadoop? Cloudera MDP webinar poll results, July 2016
  • 3. © Cloudera, Inc. All rights reserved. Why Cloudera training? Aligned to best practices and the pace of change 1 Broadest range of courses Learning paths for Developer, Admin, Analyst 2 Most experienced instructors More than 50,000 trained since 2009 6 Widest geographic coverage Most classes offered: 50 cities worldwide plus online 7 Most relevant platform & community CDH deployed more than all other distributions combined 3 Leader in certification Over 12,000 accredited Cloudera professionals Trusted source for training 100,000+ people have attended online courses4 8 Depth of training material Hands-on labs and VMs support live instruction 9 Ongoing learning Video tutorials and e-learning complement training State of the art curriculum Courses updated as Hadoop evolves5 10Commitment to big data education University partnerships to teach Hadoop in colleges
  • 4. © Cloudera, Inc. All rights reserved. What is available from Cloudera University? • Private training: Course delivered at location of customer choice to internal audience • Public training: Courses regularly scheduled around the globe. Schedule available on web • Virtual training: Live training accessed via the internet; available for public and private courses • OnDemand training: Pre-recorded lecture with identical content/exercises as live training options • Certification: Rigorously developed and meaningful bodies of knowledge OnDemand Virtual live classroom Private onsitePublic live classroom
  • 5. © Cloudera, Inc. All rights reserved. Suggested Cloudera University curricula Developers • Python/Scala Training • Developer for Spark and Hadoop • CCA: Spark and Hadoop Developer • Spark ML & Kafka modules • Topic specific training (Search, HBase) • Hands on practice • CCP: Data Engineer Administrators • Cloudera Administration training • CCA: Administrator Data Analysts/Data Scientists • Data Analyst: Using Hive, Pig & Impala • CCA: Data Analyst • Cloudera Data Science
  • 6. © Cloudera, Inc. All rights reserved. Let’s get certified!
  • 7. © Cloudera, Inc. All rights reserved. Certification Tiers  CCA (Cloudera Certified Associate)  Data Analyst, Admin and Spark & Hadoop Developer  Basic exam – but its a complex subject area  Maps to curriculum  CCP (Cloudera Certified Professional)  Data Engineer  Combination of Developer, Analyst and Big Data services  Mastery level – beyond the introduction course  Real world experience
  • 8. © Cloudera, Inc. All rights reserved. Exam format CCA and CCP certification  Not multiple choice  Hands on, practical exams similar to student exercises  Home based, no testing centres  Proctored through ExamsLocal.com  Webcam and desktop recorded and monitored  No papers / phone / drinks on desk / no talking  AWS Cloud-based cluster  Guacamole remote desktop in web browser  No Internet search during exam – only local documentation
  • 9. © Cloudera, Inc. All rights reserved. Sample CCA question  Instructions  Connect to the MySQL database on the cluster using Sqoop and import all of the data from the customer table into HDFS. The result must be comma delimited text format and put into hdfs dir /user/cert/solution3  Data Description  A MySQL instance is running on the gateway node. In that instance, you will find a table that contains twenty-five million (25,000,000) rows of customer data. MySQL database information: Installation: On the cluster node gateway Table name: customer Username: cloudera Password: cloudera
  • 10. © Cloudera, Inc. All rights reserved. Sample CCP Data Engineer question #1 Instructions  Dualcore Inc. is a leading electronics retailer. All of their customer data is in a relational database. Your task is to ingest all this data into their Hadoop cluster in the proper file format and compression for their needs.  Dualcore has a number of requirements for this data. It must be stored in a binary file format. They will keep this data for a minimum of ten years, so select a format that supports access from multiple programming languages and backward compatibility if the schema ever changes. They also require that the data be stored in a compressed format. The data is queried regularly, so choose a compression codec that is fastest for compression and decompression and included with CDH. Data Description ...
  • 11. © Cloudera, Inc. All rights reserved. Sample CCP Data Engineer question #2 Instructions LoudAcre Mobile is a mobile phone service provider that is moving a portion of their customer analytics workload to Hadoop. Before they can use their customer data, they want you to clean it and make it consistent. Errors were found while looking at the customer records. Unfortunately, different input methods wrote date fields in different formats. Your task is to standardize these date fields into a consistent format.. Data Description ... 1943233 Chrisopher Rodrigez Jan 11, 1980 8989022 John Birchall 6/7/1967 2933321 Thomas Stewart 08/22/54
  • 12. © Cloudera, Inc. All rights reserved. How to Study for CCA and CCP certification  Set aside 2 to 3 days of dedicated study time for certification  These certification tests are not easy  Review the certification webpage study points  Only study using the certification open book linked documentation  No Google, Cloudera Training material, favourite tutorial  Practice with CDH and spark software versions found in the test  Be familiar with Hive, Imapla shell, Basic Linux shell and Hue UI
  • 13. © Cloudera, Inc. All rights reserved. Practice all of the study points  Stop when confident you know the topic by practising it  Ensure your know the syntax and experienced the gotchas  Read all the documentation concerned with the study topic  Know the documented examples for your copy/paste go to  Know where to lookup parameters, config and api docs  Be able to adapt to different scenarios or link topics together  Questions have multi parts and dependencies
  • 14. © Cloudera, Inc. All rights reserved. Taking the exam  CCA Data Analyst and Developer 2 Hours 9 Questions - 13 mins per question  CCA Admin 2 hours 10 questions - 12 mins per question  CCP Engineer 4 hours 7 questions - 34 mins per question  Some questions are done in 5 mins some take 20+ or 45+ mins per question  Questions are weighted in value and can have multiple parts  Risk of a running out of time which means  Can’t complete the easy questions to pass  Can’t check your answers to fix any problems to pass  Stop any question after 20 mins and come back at the end  Skip any question that looks too hard after quick skim read and come back  Finished? Always double check your answers
  • 15. © Cloudera, Inc. All rights reserved. Common certification exam problems  Review the certification FAQ for common problems and questions marked wrong status  https://www.cloudera.com/more/training/certification/faq.html  Remote desktop or network too slow!  Do exam off peak times. Use command line shell not Hue gui.  Unfamiliar with the questions topic. Time wasted reading docs in exam time. Study!  Don’t use localhost instead use the correct gateway/master/worker hostname  Rushing and stressed makes mistakes:  Misinterpreted what the question asked.  Are directories/files/property/columns names spelled correctly?  Is output data format 100% correct ? check column order, data types, null values are what was asked. Don’t assume.  Notice any errors in logs or console when running ? Scroll back and check!
  • 16. © Cloudera, Inc. All rights reserved. Tips for studying CCA Admin  Know Cloudera Manager UI and how to search properties  Breadcrumbs, instances, safety valve advanced settings  Forget to apply setting or restart service, don’t break the cluster!  Practice topics not in the admin course but in the exam:  Sentry setup, Load balancer, Log redaction and Encrypted zones  Practice all the hdfs dfs and dfsadmin commands  Practice setting up services and service instances  Practice troubleshooting and fixing common problem applications  Know your way around the different log files
  • 17. © Cloudera, Inc. All rights reserved. Tips for studying Data Analyst certification  Study how to use regex to manipulate strings well  SQL subqueries have a temp table name, don’t forget it  Understand Sqoop warehouse dir and target dir relationship  Practice Sqoop help to quickly view and use parameters  Practice window analytic functions - not easy to do  Practice type conversions for Hive and Impala  Practice how to create partitioned/bucketed tables – lots of syntax  Copy and paste directly from the question to quickly create the table  Practice using the command line: beeline and impala shell
  • 18. © Cloudera, Inc. All rights reserved. Tips for studying CCA Spark and Hadoop  No need to be an expert in Scala or Python coding.  Only testing Spark knowledge.  Practice Sqoop, Hdfs dfs command line and your SQL  Certification has not yet been updated to spark 2.0 (uses 1.6)  New students may not be familiar with Spark 1.6. Minor differences.  Read and practice using spark documentation  Start the 1.6 spark shell with pyspark and spark-shell not spark2-shell or pyspark2
  • 19. © Cloudera, Inc. All rights reserved. Tips for studying CCP Data Enginner  Study non core topics found outside the training course material  Ignore what is not Cloudera supported  Oozie features one third of the test!  See gethue.com website for short oozie ui tutorials  How to get Oozie to run on your small default cluster:  Adjust container memory so you can run multiple containers  Increase Node manager max container size to 7 GB  Limit container memory max size to 3 GB and 1 cpu  Result on a dual core 8gb 3x worker node cluster: 6 containers.  Currently Spark 1.6 not Spark 2.0 (will be updated in the future)
  • 20. © Cloudera, Inc. All rights reserved. Qualify for free certification  Take part in a Data Analyst, Developer or Administrator Public class to receive a free certification exam in the given discipline  Valid till the end of April
  • 21. © Cloudera, Inc. All rights reserved. Thank you