SlideShare a Scribd company logo
Big DATA – Integrated course
SETUP – Hadoop, Spark, Kafka and NoSQL Environment
• Install and configure Virtual Box
• Load and configure RHEL based Virtual Machine
• Install/Configure VM with basic software’s
• User setup and Database account creation
• Configure SSH and checks ports availability
Hadoop - HDFS (Hadoop - Distributed File System)
• Hadoop Distributed file system, Background, GFS
• HDFS config files – core,hdfs, mapred site xmls
• Data Replication – Static and Dynamic configuration
• Data Storage – Block Size details
• HDFS - DFS shell commands
• HDFS -Admin commands and data recovery
Hadoop - MapReduce Framework
• MapReduce Introduction
• Writing MapReduce Programs
• Mappers and Reducers details
• Running MR jobs
• Configure custom Map and Reduce jobs.
Hadoop - Apache HIVE
• Hive Installation and Meta store setup
• Hive Shell commands
• Hive QL Basics
• Hive Local and MR mode data load
• Working with Tables, Databases etc.
• Hands on Exercises and Assignments
Spark - Spark Installation and Introduction
• Apache Spark Installation (version 2.x)
• Spark shell and Pyspark shell setup.
• Spark Executor cores and Executors setup
• Spark configurations for logs .
• Writing UDF (user defined functions)
Spark- Scala Installation and Introduction
• Scala Installation (version 2.x)
• Scala setup for Spark environment
• Scala based Spark exercise
Spark - Resilient Distributed Datasets (RDD)
• Working with RDDs in Spark
• Creating RDDs from scratch
• Creating RDD from preexisting data
• Accumulators and Broadcast variables
• RDD – Transformations commands
• RDD – Actions commands
• RDD complex exercises
Spark – Spark SQL and Data Frames
• Spark SQL and the SQL Context
• Creating DataFrames from raw datasets
• Transforming and Querying DataFrames
• Using csv files and mapping schema
• Using case structures and user defined data types
Spark - Spark Mlib (Machine Learning)
• Basic Principles of Machine Learning
• Supervised and Unsupervised Learnings
• Setup Machine Learning for Spark
• Transformations, Correlation Algorithm.
• Exercise for Regression , Correlation.
Kafka- Apache Kafka
• Introduction to Apache Kafka
• Identifying the major Kafka components
• Determining what data is appropriate for use with Kafka
• Developing with Kafka producers, consumers, and brokers
Kafka- Installation and Labs
• Kafka Features and terminologies
• High level Kafka architecture
• Kafka Installation in Linux/Windows.
• Install Kafka Zookeeper
• Install Kafka Server
Kafka- Consumer, Producer and Topics
• Writing Kafka Consumer Labs
• Create Kafka Messages
• Create Kafka Topics
• Message structure and topic configuration
• Write Kafka Producer
• Configure Producer and Kafka Server
• Kafka Multi Broker Configuration
NoSQL- Introduction and Details
• NoSQL databases introduction
• Types of NoSQL databases – MongoDB, Cassandra, Couch DB
• Use cases for NoSQL databases
• Document DB types
• Comparison with RDBMS
NoSQL- MongoDB
• MongoDB installation on Linux/windows box
• Mongo Demon threads
• Mongo Shell configuration
• Mongo collection creation
• Mongo data load in collections
NoSQL – Mongo Query Language
• MongoDB query language
• Mongo create() , update() and delete() query
• Mongo find() query
Study Materials and Labs
1) Complete Virtual Machine is shared with students. It has Java , Oracle DB , Mozilla
Firefox and other components pre-installed
2) The VM can be used even after the training is DONE. Please note it’s NOT a remote
lab type environment. You will be able to keep the VM and all labs even after the
training is completed

More Related Content

PPTX
Ansible for large scale deployment
ODT
Hadoop online trainings
PDF
MySQL Query Optimization (Basics)
PPTX
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
PDF
Modern MySQL Monitoring and Dashboards.
PPTX
kafka-steaming-data
PPT
MySQL HA Percona cluster @ MySQL meetup Mumbai
PPT
Hadoop hbase introduction
Ansible for large scale deployment
Hadoop online trainings
MySQL Query Optimization (Basics)
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Modern MySQL Monitoring and Dashboards.
kafka-steaming-data
MySQL HA Percona cluster @ MySQL meetup Mumbai
Hadoop hbase introduction

What's hot (19)

PDF
Hadoop spark online demo
PPT
Scaling MySQL using Fabric
PPTX
MongoDB and Amazon Web Services: Storage Options for MongoDB Deployments
PDF
Application Development with Apache Cassandra as a Service
PPTX
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
PPTX
Demystfying nosql databases
PDF
CosmosDB for DBAs & Developers
PPTX
Running MongoDB on AWS
PDF
Ceph as storage for CloudStack
PPTX
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
PPTX
Introduction to Apache HBase
PDF
Configuring workload-based storage and topologies
PPTX
Migrating from InnoDB and HBase to MyRocks at Facebook
PDF
2016 jan-pugs-meetup-v9.5-features
PPTX
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
PDF
20141206 4 q14_dataconference_i_am_your_db
PPTX
Introduction to CosmosDB - Azure Bootcamp 2018
PDF
Beyond Postgres: Interesting Projects, Tools and forks
PPTX
MongoDB and AWS: Integrations
Hadoop spark online demo
Scaling MySQL using Fabric
MongoDB and Amazon Web Services: Storage Options for MongoDB Deployments
Application Development with Apache Cassandra as a Service
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
Demystfying nosql databases
CosmosDB for DBAs & Developers
Running MongoDB on AWS
Ceph as storage for CloudStack
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
Introduction to Apache HBase
Configuring workload-based storage and topologies
Migrating from InnoDB and HBase to MyRocks at Facebook
2016 jan-pugs-meetup-v9.5-features
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
20141206 4 q14_dataconference_i_am_your_db
Introduction to CosmosDB - Azure Bootcamp 2018
Beyond Postgres: Interesting Projects, Tools and forks
MongoDB and AWS: Integrations
Ad

Similar to Big data_hadoop_spark_kafka_nosql_training (20)

DOC
Manoj CV
PPT
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
PPTX
Apache Spark Fundamentals
PPTX
SQL on Hadoop
PPTX
Real time fraud detection at 1+M scale on hadoop stack
PPTX
Big data hadoop training in pune course content advanto software
PPTX
Innovation with Connection, The new HPCC Systems Plugins and Modules
PPT
Speeding Up The Snail
PPTX
Real time Analytics with Apache Kafka and Apache Spark
PPTX
Hadoop Training in Hyderabad
PPTX
Hadoop Training in Hyderabad
PPTX
Qubole - Big data in cloud
PPTX
Apache spark
PPTX
Hadoop Meetup Jan 2019 - Hadoop On Azure
PDF
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
PPTX
Big data architecture on cloud computing infrastructure
PPTX
Hadoop ppt on the basics and architecture
PPTX
Apache Spark on HDinsight Training
PDF
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
PPTX
Getting Started with Hadoop
Manoj CV
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Apache Spark Fundamentals
SQL on Hadoop
Real time fraud detection at 1+M scale on hadoop stack
Big data hadoop training in pune course content advanto software
Innovation with Connection, The new HPCC Systems Plugins and Modules
Speeding Up The Snail
Real time Analytics with Apache Kafka and Apache Spark
Hadoop Training in Hyderabad
Hadoop Training in Hyderabad
Qubole - Big data in cloud
Apache spark
Hadoop Meetup Jan 2019 - Hadoop On Azure
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Big data architecture on cloud computing infrastructure
Hadoop ppt on the basics and architecture
Apache Spark on HDinsight Training
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Getting Started with Hadoop
Ad

More from Kamal A (6)

PDF
All python data_analyst_r_course
PDF
Project using kafka spark mongo db project
PDF
Bigdata Hadoop project payment gateway domain
PDF
Payment Gateway Live hadoop project
PDF
Practical Hadoop Big Data Training Course by Certified Architect
DOCX
Hadoop online training course
All python data_analyst_r_course
Project using kafka spark mongo db project
Bigdata Hadoop project payment gateway domain
Payment Gateway Live hadoop project
Practical Hadoop Big Data Training Course by Certified Architect
Hadoop online training course

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Tartificialntelligence_presentation.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Getting Started with Data Integration: FME Form 101
PPT
Teaching material agriculture food technology
PDF
Encapsulation theory and applications.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Approach and Philosophy of On baking technology
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
20250228 LYD VKU AI Blended-Learning.pptx
Big Data Technologies - Introduction.pptx
Tartificialntelligence_presentation.pptx
Electronic commerce courselecture one. Pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation_ Review paper, used for researhc scholars
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Getting Started with Data Integration: FME Form 101
Teaching material agriculture food technology
Encapsulation theory and applications.pdf
cuic standard and advanced reporting.pdf
Unlocking AI with Model Context Protocol (MCP)
A comparative analysis of optical character recognition models for extracting...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Assigned Numbers - 2025 - Bluetooth® Document
Approach and Philosophy of On baking technology
Dropbox Q2 2025 Financial Results & Investor Presentation

Big data_hadoop_spark_kafka_nosql_training

  • 1. Big DATA – Integrated course SETUP – Hadoop, Spark, Kafka and NoSQL Environment • Install and configure Virtual Box • Load and configure RHEL based Virtual Machine • Install/Configure VM with basic software’s • User setup and Database account creation • Configure SSH and checks ports availability Hadoop - HDFS (Hadoop - Distributed File System) • Hadoop Distributed file system, Background, GFS • HDFS config files – core,hdfs, mapred site xmls • Data Replication – Static and Dynamic configuration • Data Storage – Block Size details • HDFS - DFS shell commands • HDFS -Admin commands and data recovery Hadoop - MapReduce Framework • MapReduce Introduction • Writing MapReduce Programs • Mappers and Reducers details • Running MR jobs • Configure custom Map and Reduce jobs. Hadoop - Apache HIVE • Hive Installation and Meta store setup • Hive Shell commands • Hive QL Basics • Hive Local and MR mode data load • Working with Tables, Databases etc. • Hands on Exercises and Assignments Spark - Spark Installation and Introduction • Apache Spark Installation (version 2.x) • Spark shell and Pyspark shell setup. • Spark Executor cores and Executors setup • Spark configurations for logs . • Writing UDF (user defined functions)
  • 2. Spark- Scala Installation and Introduction • Scala Installation (version 2.x) • Scala setup for Spark environment • Scala based Spark exercise Spark - Resilient Distributed Datasets (RDD) • Working with RDDs in Spark • Creating RDDs from scratch • Creating RDD from preexisting data • Accumulators and Broadcast variables • RDD – Transformations commands • RDD – Actions commands • RDD complex exercises Spark – Spark SQL and Data Frames • Spark SQL and the SQL Context • Creating DataFrames from raw datasets • Transforming and Querying DataFrames • Using csv files and mapping schema • Using case structures and user defined data types Spark - Spark Mlib (Machine Learning) • Basic Principles of Machine Learning • Supervised and Unsupervised Learnings • Setup Machine Learning for Spark • Transformations, Correlation Algorithm. • Exercise for Regression , Correlation. Kafka- Apache Kafka • Introduction to Apache Kafka • Identifying the major Kafka components • Determining what data is appropriate for use with Kafka • Developing with Kafka producers, consumers, and brokers Kafka- Installation and Labs • Kafka Features and terminologies • High level Kafka architecture • Kafka Installation in Linux/Windows. • Install Kafka Zookeeper • Install Kafka Server
  • 3. Kafka- Consumer, Producer and Topics • Writing Kafka Consumer Labs • Create Kafka Messages • Create Kafka Topics • Message structure and topic configuration • Write Kafka Producer • Configure Producer and Kafka Server • Kafka Multi Broker Configuration NoSQL- Introduction and Details • NoSQL databases introduction • Types of NoSQL databases – MongoDB, Cassandra, Couch DB • Use cases for NoSQL databases • Document DB types • Comparison with RDBMS NoSQL- MongoDB • MongoDB installation on Linux/windows box • Mongo Demon threads • Mongo Shell configuration • Mongo collection creation • Mongo data load in collections NoSQL – Mongo Query Language • MongoDB query language • Mongo create() , update() and delete() query • Mongo find() query Study Materials and Labs 1) Complete Virtual Machine is shared with students. It has Java , Oracle DB , Mozilla Firefox and other components pre-installed 2) The VM can be used even after the training is DONE. Please note it’s NOT a remote lab type environment. You will be able to keep the VM and all labs even after the training is completed