SlideShare a Scribd company logo
Machine Learning Using Spark
2
Copyright @ 2019 Learntek. All Rights Reserved. 3
What is Machine Learning?
Machine learning Using Spark – Spark MLlib is an application of artificial
intelligence (AI) that provides systems the ability to automatically learn and
improve from experience without being explicitly programmed. Machine learning
focuses on the development of computer programs that can access data and use it
learn for themselves.
The process of learning begins with observations or data, such as examples, direct
experience, or instruction, in order to look for patterns in data and make better
decisions in the future based on the examples that we provide. The primary aim is
to allow the computers to learn automatically without human intervention or
assistance and adjust actions accordingly.
Copyright @ 2019 Learntek. All Rights Reserved. 4
Into to Machine Learning Using Spark
MLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine
learning scalable and easy. At a high level, it provides tools such as:
ML Algorithms: common learning algorithms such as classification, regression,
clustering, and collaborative filtering
Featurization: feature extraction, transformation, dimensionality reduction, and
selection
Pipelines: tools for constructing, evaluating, and tuning ML Pipelines
Persistence: saving and load algorithms, models, and Pipelines
Utilities: linear algebra, statistics, data handling, etc.
Copyright @ 2019 Learntek. All Rights Reserved. 5
Tools
This course will be delivered using Scala and PYTHON API. For explaining statistical
concept, R language will also be using. Visualization part will be covered using
Bokeh/ggplot library.
Introduction to Apache Spark
Spark Programming model
RDD and Data Frame
Transformation and Action
Broadcast and Accumulator
Running HDP on local machine
Launching Spark Cluster
Copyright @ 2019 Learntek. All Rights Reserved.
6
Basic Statistics
Descriptive Statistics
• Mean, Mode, Media, Range, Variance,
Standard Deviation, Quartiles, Percentiles
Sampling
Sampling Methods
Sampling Errors
Probability Distributions
• Normal distribution, t-distribution, Chi-
square, F
Margin of Error, Confidence Interval,
Significance level, Degree of Freedom
Hypothesis concept, Type I and Type II error
P-value, t-Test, Chi-square Test
Correlation Coefficient
Copyright @ 2019 Learntek. All Rights Reserved. 7
Machine Learning Using Spark
Introduction to Spark Mllib
Data types: Vector, Labeled Point
Feature Extraction
Feature Transformation, Normalization
Feature Selectors
Locality Sensitive Hashing(LSH)
Copyright @ 2019 Learntek. All Rights Reserved. 8
Regression Analysis with Spark
Types of Regression Models
Gradient Descent
Linear Regression, Generalized Linear
Regression
MSE, RMSE MAE, R-squared Coefficient
Transforming the target variable
Tuning Model Parameters
Copyright @ 2019 Learntek. All Rights Reserved. 9
Classification Model with Spark
Types of Classification Models
• Linear Models, Naives Bayes Model, Decision
Tree
Logistic Regression
Linear Support Vector Machine
Random Forest
Gradient-Boosted Trees
Training Classification Models
Accuracy and prediction error
Precision and Recall
ROC curve and AUC
Cross validation
Copyright @ 2019 Learntek. All Rights Reserved. 10
Clustering
Hierarchical clustering
K-mean clustering
Dimensionality Reduction
Principal Component Analysis
Singular Value Decomposition
Clustering as dimensionality reduction
Training a dimensionality reduction model
Evaluating dimensionality reduction models
Copyright @ 2019 Learntek. All Rights Reserved. 11
Recommendation Engine
Content based filtering
Collaborative based filtering
Overview of Movie Lens data
Training a recommendation model
Using the recommendation model
Performance Evaluation
Text Processing
Feature Hashing
TF-IDF model
Tokenization
Stop words
TF-IDF Weightings
Training a TF-IDF model
Usage of TF-IDF model
Evaluating TF-IDF models
Copyright @ 2019 Learntek. All Rights Reserved. 12
Prerequisites :
Prior understanding of exploratory data analysis and data visualization will help
immensely in learning machine learning concept and applications. This include
basic statistical technique for data analysis. Having some knowledge of R
programming or some Python packages like sci-kit, numpy will be useful. However ,
we are going to cover basic statistics technique as part of this course before going
deep into machine learning . This will help everyone to gain maximum from this
course.
Copyright @ 2019 Learntek. All Rights Reserved. 13
For more Training Information , Contact Us
Email : info@learntek.org
USA : +1734 418 2465
INDIA : +40 4018 1306
+7799713624

More Related Content

DOCX
Heet detroja.resume
PDF
Best Python Libraries For Data Science & Machine Learning | Edureka
PDF
Introduction To Data Science With Python
PPTX
Introduction to Auto ML
PPTX
Thinking About Guideline for Data Interoperability - Design concept and workf...
PPTX
Introduction To Machine Learning
PPTX
Introduction to machine learning
PDF
applications and advantages of python
Heet detroja.resume
Best Python Libraries For Data Science & Machine Learning | Edureka
Introduction To Data Science With Python
Introduction to Auto ML
Thinking About Guideline for Data Interoperability - Design concept and workf...
Introduction To Machine Learning
Introduction to machine learning
applications and advantages of python

What's hot (20)

PDF
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
PDF
Machine learning with python
DOCX
PDF
II-SDV 2017: Auto Classification: Can/Should AI replace You?
PDF
ML and R
PPTX
Python and its applications
PDF
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
PPTX
Machine learning
PDF
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
PPTX
Data Science With Python | Python For Data Science | Python Data Science Cour...
PDF
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
PDF
DC02. Interpretation of predictions
PPTX
Explore The Machine Learning and TensorFlow
PPTX
Interpretable Machine Learning
PPT
Machine Learning
PDF
Sakshi Sharma: Resume
PDF
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
PPTX
Final Year Projects Based on MATLAB Research Assistance
PPTX
Projects on MATLAB Research Assistance
PPTX
Intro/Overview on Machine Learning Presentation
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
Machine learning with python
II-SDV 2017: Auto Classification: Can/Should AI replace You?
ML and R
Python and its applications
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
Machine learning
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...
Data Science With Python | Python For Data Science | Python Data Science Cour...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
DC02. Interpretation of predictions
Explore The Machine Learning and TensorFlow
Interpretable Machine Learning
Machine Learning
Sakshi Sharma: Resume
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Final Year Projects Based on MATLAB Research Assistance
Projects on MATLAB Research Assistance
Intro/Overview on Machine Learning Presentation
Ad

Similar to Ml product page (20)

PPTX
Machine learning using spark Online Training
PPTX
Machine Learning and Its Real-World Impact
PDF
Machine Learning for (JVM) Developers
PPTX
Machine learning using Python IT Learning 2020
PPT
Machine learning-in-details-with-out-python-code
PDF
Machine Learning_Unit 2_Full.ppt.pdf
PPTX
Machine_Learning_Basics_Presentation.pptx
PDF
ML.pdf
PPTX
Machine learning ppt.
PPTX
Unit - 1 - Introduction of the machine learning
PPTX
Machine-Learning-Unlocking-the-Power-of-Data.pptx
PDF
Machine Learning with Big Data using Apache Spark
PPTX
machine learning introduction notes foRr
PDF
Machine Learning and Deep Learning from Foundations to Applications Excel, R,...
PPTX
BIG DATA AND MACHINE LEARNING
PDF
The Ultimate Guide to Machine Learning (ML)
PDF
Efficient Learning Machines Theories Concepts And Applications For Engineers ...
PPTX
Lectuhhhhhhhhhhhhhhhhhhhhhhbbbhhhre 1.pptx
PPTX
Machine Learning course in Chandigarh Join
PDF
what-is-machine-learning-and-its-importance-in-todays-world.pdf
Machine learning using spark Online Training
Machine Learning and Its Real-World Impact
Machine Learning for (JVM) Developers
Machine learning using Python IT Learning 2020
Machine learning-in-details-with-out-python-code
Machine Learning_Unit 2_Full.ppt.pdf
Machine_Learning_Basics_Presentation.pptx
ML.pdf
Machine learning ppt.
Unit - 1 - Introduction of the machine learning
Machine-Learning-Unlocking-the-Power-of-Data.pptx
Machine Learning with Big Data using Apache Spark
machine learning introduction notes foRr
Machine Learning and Deep Learning from Foundations to Applications Excel, R,...
BIG DATA AND MACHINE LEARNING
The Ultimate Guide to Machine Learning (ML)
Efficient Learning Machines Theories Concepts And Applications For Engineers ...
Lectuhhhhhhhhhhhhhhhhhhhhhhbbbhhhre 1.pptx
Machine Learning course in Chandigarh Join
what-is-machine-learning-and-its-importance-in-todays-world.pdf
Ad

More from Janu Jahnavi (20)

PDF
Analytics using r programming
PDF
Software testing
PPTX
Software testing
PPTX
Spring
PDF
Stack skills
PPTX
Ui devopler
PPTX
Apache flink
PDF
Apache flink
PDF
Angular js
PDF
Mysql python
PPTX
Mysql python
PDF
Ruby with cucmber
PPTX
Apache kafka
PDF
Apache kafka
PPTX
Google cloud platform
PPTX
Google cloud Platform
PDF
Apache spark with java 8
PPTX
Apache spark with java 8
PDF
Categorizing and pos tagging with nltk python
PPTX
Categorizing and pos tagging with nltk python
Analytics using r programming
Software testing
Software testing
Spring
Stack skills
Ui devopler
Apache flink
Apache flink
Angular js
Mysql python
Mysql python
Ruby with cucmber
Apache kafka
Apache kafka
Google cloud platform
Google cloud Platform
Apache spark with java 8
Apache spark with java 8
Categorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk python

Recently uploaded (20)

PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Computing-Curriculum for Schools in Ghana
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Basic Mud Logging Guide for educational purpose
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
master seminar digital applications in india
PDF
RMMM.pdf make it easy to upload and study
PDF
Complications of Minimal Access Surgery at WLH
PDF
Insiders guide to clinical Medicine.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPH.pptx obstetrics and gynecology in nursing
O5-L3 Freight Transport Ops (International) V1.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Renaissance Architecture: A Journey from Faith to Humanism
Computing-Curriculum for Schools in Ghana
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Basic Mud Logging Guide for educational purpose
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Supply Chain Operations Speaking Notes -ICLT Program
master seminar digital applications in india
RMMM.pdf make it easy to upload and study
Complications of Minimal Access Surgery at WLH
Insiders guide to clinical Medicine.pdf
VCE English Exam - Section C Student Revision Booklet
human mycosis Human fungal infections are called human mycosis..pptx
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...

Ml product page

  • 2. 2
  • 3. Copyright @ 2019 Learntek. All Rights Reserved. 3 What is Machine Learning? Machine learning Using Spark – Spark MLlib is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it learn for themselves. The process of learning begins with observations or data, such as examples, direct experience, or instruction, in order to look for patterns in data and make better decisions in the future based on the examples that we provide. The primary aim is to allow the computers to learn automatically without human intervention or assistance and adjust actions accordingly.
  • 4. Copyright @ 2019 Learntek. All Rights Reserved. 4 Into to Machine Learning Using Spark MLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. At a high level, it provides tools such as: ML Algorithms: common learning algorithms such as classification, regression, clustering, and collaborative filtering Featurization: feature extraction, transformation, dimensionality reduction, and selection Pipelines: tools for constructing, evaluating, and tuning ML Pipelines Persistence: saving and load algorithms, models, and Pipelines Utilities: linear algebra, statistics, data handling, etc.
  • 5. Copyright @ 2019 Learntek. All Rights Reserved. 5 Tools This course will be delivered using Scala and PYTHON API. For explaining statistical concept, R language will also be using. Visualization part will be covered using Bokeh/ggplot library. Introduction to Apache Spark Spark Programming model RDD and Data Frame Transformation and Action Broadcast and Accumulator Running HDP on local machine Launching Spark Cluster
  • 6. Copyright @ 2019 Learntek. All Rights Reserved. 6 Basic Statistics Descriptive Statistics • Mean, Mode, Media, Range, Variance, Standard Deviation, Quartiles, Percentiles Sampling Sampling Methods Sampling Errors Probability Distributions • Normal distribution, t-distribution, Chi- square, F Margin of Error, Confidence Interval, Significance level, Degree of Freedom Hypothesis concept, Type I and Type II error P-value, t-Test, Chi-square Test Correlation Coefficient
  • 7. Copyright @ 2019 Learntek. All Rights Reserved. 7 Machine Learning Using Spark Introduction to Spark Mllib Data types: Vector, Labeled Point Feature Extraction Feature Transformation, Normalization Feature Selectors Locality Sensitive Hashing(LSH)
  • 8. Copyright @ 2019 Learntek. All Rights Reserved. 8 Regression Analysis with Spark Types of Regression Models Gradient Descent Linear Regression, Generalized Linear Regression MSE, RMSE MAE, R-squared Coefficient Transforming the target variable Tuning Model Parameters
  • 9. Copyright @ 2019 Learntek. All Rights Reserved. 9 Classification Model with Spark Types of Classification Models • Linear Models, Naives Bayes Model, Decision Tree Logistic Regression Linear Support Vector Machine Random Forest Gradient-Boosted Trees Training Classification Models Accuracy and prediction error Precision and Recall ROC curve and AUC Cross validation
  • 10. Copyright @ 2019 Learntek. All Rights Reserved. 10 Clustering Hierarchical clustering K-mean clustering Dimensionality Reduction Principal Component Analysis Singular Value Decomposition Clustering as dimensionality reduction Training a dimensionality reduction model Evaluating dimensionality reduction models
  • 11. Copyright @ 2019 Learntek. All Rights Reserved. 11 Recommendation Engine Content based filtering Collaborative based filtering Overview of Movie Lens data Training a recommendation model Using the recommendation model Performance Evaluation Text Processing Feature Hashing TF-IDF model Tokenization Stop words TF-IDF Weightings Training a TF-IDF model Usage of TF-IDF model Evaluating TF-IDF models
  • 12. Copyright @ 2019 Learntek. All Rights Reserved. 12 Prerequisites : Prior understanding of exploratory data analysis and data visualization will help immensely in learning machine learning concept and applications. This include basic statistical technique for data analysis. Having some knowledge of R programming or some Python packages like sci-kit, numpy will be useful. However , we are going to cover basic statistics technique as part of this course before going deep into machine learning . This will help everyone to gain maximum from this course.
  • 13. Copyright @ 2019 Learntek. All Rights Reserved. 13 For more Training Information , Contact Us Email : info@learntek.org USA : +1734 418 2465 INDIA : +40 4018 1306 +7799713624