SlideShare a Scribd company logo
Robert	Hryniewicz
Data	Evangelist
@RobHryniewicz
Hands-on	Intro	to	Data	Science
with	Apache	Spark
Crash Course
2 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Plan for Today
• Data Science & ML
• ML Examples
• Overview of ML methods
• K-means, Decision Trees & Random Forests
• Spark MLlib & ML
• Lab Overview
3 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Data	Science	Examples
4 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
5 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Predictive Analytics Pre-requisites
Sales	Play	4:	Predictive	Analytics
6 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Predictive Analytics Process and Tools
7 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Machine	Learning
“… science of how
computers learn without
being explicitly
programmed” – Andrew Ng
8 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Machine	Learning	Methods
9 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Supervised
vs
Unsupervised
Learning
Examples	
labeled.
Examples	not
labeled.
10 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Unsupervised	LearningSupervised	Learning
11 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
CLASSIFICATION
Identifying	to	which	category	an	object	belongs	to.
Applications:	spam	detection,	image	recognition,	...
Algorithms:	k-nn,	decision	trees,	random	forest,	...
12 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
REGRESSION
Predicting	a	continuous-valued	attribute	
associated	with	an	object.
Applications:	drug	response,	stock	prices,	…
Algorithms: linear	regression,	…
13 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
CLUSTERING
Automatic	grouping	of	similar	objects	into	sets.
Applications:	customer	segmentation,	topic	modeling,	…
Algorithms: k-means,	LDA,	…
14 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
COLLABORATIVE	FILTERING
Fill	in	the	missing	entries	of	a	user-item	association	matrix.
Applications:	Product	recommendation,	…
Algorithms: Alternating Least Squares (ALS)
15 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
DIMENSIONALITY	REDUCTION
Reducing	the	number	of	random	variables	to	consider.
Applications:	visualization,	increased	efficiency,	…
Algorithms: PCA,	t-SNE,	…
16 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
PREPROCESSING
Feature	extraction	and	normalization
Applications:	transforming	input	data	such	as	text	as	input	to	ML	algorithms
Algorithms:	TF-IDF,	word2vec,	one	hot	encoding,	…
17 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
MODEL	SELECTION
Comparing,	validating	and	choosing	parameters	and	models.
Applications:	improved	accuracy	via	parameter	tuning
Algorithms:	grid	search,	metrics	…
18 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Spark	MLlib
19 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Spark	Machine	Learning	Library
à Clustering
– k-means	clustering
– latent	Dirichlet allocation	(LDA)
à Dimensionality	reduction
– singularity	value	decomposition	(SVD)
– principal	component	analysis	(PCA)
à Feature	Extractors	&	Transformers
– word2vec
à Basic	statistics
– summary	statistics
– hypothesis	testing
– random	number	generation
à Classification	and	regression
– linear	models	(SVMs,	log	&	linear	regression)
– decision	trees
– ensembles	of	trees	(Random	Forests	&	GBTs)
à Collaborative	filtering
– alternating	least	squares	(ALS)
20 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
K-Means	Clustering
(Unsupervised	Learning)
21 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Why K-Means
à Simple	&	fast	algorithm	to	find	clusters
à Common	technique	for	anomaly	detection
à Drawbacks
– Doesn't	work	well	with	non-circular	cluster	shape
– Number	of	cluster	and	initial	seed	value	need	to	be	specified	beforehand
– Strong	sensitivity	to	outliers	and	noise
– Low	capability	to	pass	the	local	optimum.
22 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Initialize Cluster Centers
Randomly	pick	3	
cluster	centers.
23 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Assign Each Point
Assign	each	point	
to	the	nearest	
cluster	center.
24 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Recompute Cluster Centers
Move	each	
cluster	to	the	
mean	of	each	
cluster.
25 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
K-means Clustering
26 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
San Francisco
27 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Outline Each Neighborhood
28 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Folium: choropleth map
29 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
SF Neighborhood Centers Calculated with K-Means
30 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Sample Dataset – K-Means
0.0, 0.0, 0.0
0.1, 0.1, 0.1
0.2, 0.2, 0.2
3.0, 3.0, 3.0
3.1, 3.1, 3.1
3.2, 3.2, 3.2
31 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Decision	Trees	&	Random	Forests	
(Supervised	Learning)
32 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Why	Decision	Trees?
à Simple	to	understand	and	interpret. (And	explain	to	executives.)
à Requires	little	data	preparation. (Other	techniques	often	require	data	
normalisation, dummy	variables	need	to	be	created	and	blank	values	to	be	removed.)
à Performs	well	with	large	datasets.
33 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Visual	Intro	to	Decision	Trees
à http://www.r2d3.us/visual-intro-to-machine-learning-part-1
34 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Random Forest (Ensemble Model)
à Main	idea:	build	an	ensemble	of	simple	decision	trees
à Each	tree	is	simple	and	less	likely	to	overfit
à Classify/predict	by	voting	between	all	trees
35 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Decision	Tree	vs	Random	Forest
36 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Overcome	limitations	of	a	single	hypothesis
Decision	Tree Model	Averaging
Why	Ensembles	work?
37 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Diabetes	Dataset	– Decision	Trees	/	Random	Forest
Labeled	set	with	8	Features
-1 1:-0.294118 2:0.487437 3:0.180328 4:-0.292929 5:-1 6:0.00149028 7:-0.53117 8:-0.0333333
+1 1:-0.882353 2:-0.145729 3:0.0819672 4:-0.414141 5:-1 6:-0.207153 7:-0.766866 8:-0.666667
-1 1:-0.0588235 2:0.839196 3:0.0491803 4:-1 5:-1 6:-0.305514 7:-0.492741 8:-0.633333
+1 1:-0.882353 2:-0.105528 3:0.0819672 4:-0.535354 5:-0.777778 6:-0.162444 7:-0.923997 8:-1
-1 1:-1 2:0.376884 3:-0.344262 4:-0.292929 5:-0.602837 6:0.28465 7:0.887276 8:-0.6
+1 1:-0.411765 2:0.165829 3:0.213115 4:-1 5:-1 6:-0.23696 7:-0.894962 8:-0.7
-1 1:-0.647059 2:-0.21608 3:-0.180328 4:-0.353535 5:-0.791962 6:-0.0760059 7:-0.854825 8:-0.833333
...
38 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Machine	Learning	in	Spark
39 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Spark	Ecosystem
Spark	Core
Spark	SQL Spark	Streaming MLlib GraphX
40 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Machine	Learning	with	Spark	(MLlib &	ML)
à Original	“lower”	API
à Built	on	top	of	RDDs
à Maintenance	mode	starting	with	Spark	2.0
MLlib
à Newer	“higher-level”	API	for	constructing	workflows
à Built	on	top	of	DataFrames
ML
Both algorithms
implemented to take
advantage of data
parallelism
41 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Predict
Model
Supervised Learning: End-to-End Flow
Feature Extraction
Train the
Model
ModelData items
Labels
Data item Feature Extraction Label
Training
(batch)
Predicting
(real time or batch)
Feature Matrix
Feature Vector
Training set
42 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Spark ML: Spark API for building ML pipelines
Feature	
transform	
1
Feature	
transform	
2
Combine	
features
Random	
Forest
Input
DataFrame
(TRAIN)
Input
DataFrame
(TEST)
Output
Dataframe
(PREDICTIONS)
Pipeline
Pipeline	Model
43 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Spark ML Pipeline
à Pipeline includes both fit() and transform() methods
– fit() is for training
– transform() is for prediction
Input
DataFrame
(TRAIN)
Input
DataFrame
(TEST)
Output
Dataframe
(PREDICTIONS)
Pipeline
Pipeline	Model
fit()
transform()
model = pipe.fit(trainData) # Train model
results = model.transform(testData) # Test model
44 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Spark ML – Simple Random Forest Example
indexer = StringIndexer(inputCol=”district", outputCol=”dis-inx")
parser = Tokenizer(inputCol=”text-field", outputCol="words")
hashingTF = HashingTF(numFeatures=50, inputCol="words", outputCol="hash-inx")
vecAssembler = VectorAssembler(
inputCols =[“dis-inx”, “hash-inx”],
outputCol="features")
rf = RandomForestClassifier(numTrees=100, labelCol="label", seed=42)
pipe = Pipeline(stages=[indexer, parser, hashingTF, vecAssembler, rf])
model = pipe.fit(trainData) # Train model
results = model.transform(testData) # Test model
45 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Apache	Zeppelin	– A	Modern	Web-based	Data	Science	Studio
à Data	exploration	and	discovery
à Visualization
à Deeply	integrated	with	Spark	and	Hadoop
à Pluggable	interpreters
à Multiple	languages	in	one	notebook:	R,	Python,	Scala
46 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
47 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Exporting ML Models - PMML
à Predictive	Model	Markup	Language	(PMML)
à Supported	models
– K-Means	
– Linear	Regression
– Ridge	Regression	
– Lasso
– SVM
– Binary
48 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Additional Resources
• Machine	Learning
• Natural	Language	Processing	(NLP)
• Scalable	Machine	Learning
• Introduction	to	Statistics
49 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Lab Overview
tinyurl.com/hwx-intro-to-ml-with-spark
50 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Hortonworks	Community	Connection
Read access for everyone, join to participate and be recognized
• Full	Q&A	Platform	(like	StackOverflow)
• Knowledge	Base	Articles
• Code	Samples	and	Repositories
51 ©	Hortonworks	Inc.	2011	–2016.	All	Rights	Reserved
Community	Engagement
community.hortonworks.com
©	Hortonworks	Inc.	2011	–2015.	All	Rights	Reserved
7,500+
Registered	Users
15,000+
Answers
20,000+
Technical	Assets
One Website!
Robert	Hryniewicz
@RobHryniewicz
Thanks!

More Related Content

PDF
Data Science with Apache Spark - Crash Course - HS16SJ
PDF
#HSTokyo16 Apache Spark Crash Course
PDF
IoT Crash Course Hadoop Summit SJ
PDF
Intro to Spark & Zeppelin - Crash Course - HS16SJ
PDF
Apache Hadoop Crash Course - HS16SJ
PDF
Introduction to Hadoop
PPTX
Welcome to Apache Hadoop's Teenage Years, Arun Murthy Keynote
PDF
Hortonworks, Novetta and Noble Energy Webinar
Data Science with Apache Spark - Crash Course - HS16SJ
#HSTokyo16 Apache Spark Crash Course
IoT Crash Course Hadoop Summit SJ
Intro to Spark & Zeppelin - Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJ
Introduction to Hadoop
Welcome to Apache Hadoop's Teenage Years, Arun Murthy Keynote
Hortonworks, Novetta and Noble Energy Webinar

What's hot (20)

PDF
Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017
PPTX
Automatic Detection, Classification and Authorization of Sensitive Personal D...
PDF
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
PDF
Apache Hadoop Crash Course
PPTX
Enabling the Real Time Analytical Enterprise
PDF
Dataflow with Apache NiFi - Crash Course - HS16SJ
PPTX
Spark and Hadoop Perfect Togeher by Arun Murthy
PPTX
Hadoop and Spark – Perfect Together
PDF
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
PDF
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
PDF
Apache NiFi Meetup - Princeton NJ 2016
PDF
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
PPTX
Hadoop Summit Tokyo Apache NiFi Crash Course
PPTX
The Elephant in the Clouds
PDF
Webinar turbo charging_data_science_hawq_on_hdp_final
PPTX
YARN - Past, Present, & Future
PDF
HDF 3.1 : An Introduction to New Features
PDF
Supporting Financial Services with a More Flexible Approach to Big Data
PPTX
State of the Union with Shaun Connolly
PDF
Zementis hortonworks-webinar-2014-09
Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017
Automatic Detection, Classification and Authorization of Sensitive Personal D...
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
Apache Hadoop Crash Course
Enabling the Real Time Analytical Enterprise
Dataflow with Apache NiFi - Crash Course - HS16SJ
Spark and Hadoop Perfect Togeher by Arun Murthy
Hadoop and Spark – Perfect Together
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Apache NiFi Meetup - Princeton NJ 2016
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hadoop Summit Tokyo Apache NiFi Crash Course
The Elephant in the Clouds
Webinar turbo charging_data_science_hawq_on_hdp_final
YARN - Past, Present, & Future
HDF 3.1 : An Introduction to New Features
Supporting Financial Services with a More Flexible Approach to Big Data
State of the Union with Shaun Connolly
Zementis hortonworks-webinar-2014-09
Ad

Viewers also liked (20)

PDF
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
PDF
Hadoop Crash Course Hadoop Summit SJ
PDF
Native erasure coding support inside hdfs presentation
PDF
図でわかるHDFS Erasure Coding
PDF
HDFS Deep Dive
PDF
Hadoop Workshop on EC2 : March 2015
PPTX
Multi User Data science with Zeppelin
PPTX
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
PDF
Timeline Service v.2 (Hadoop Summit 2016)
PDF
Apache Zeppelin Helium and Beyond
PPTX
Open Source Ingredients for Interactive Data Analysis in Spark
PPTX
Hadoop crashcourse v3
PDF
Performance comparison of Distributed File Systems on 1Gbit networks
PPTX
What's new in hadoop 3.0
PPTX
HDFS Erasure Coding in Action
PPTX
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
PPTX
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
PPTX
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
PPTX
Hadoop crash course workshop at Hadoop Summit
PDF
Automation of Rolling Upgrade of Hadoop Cluster without Data Lost and Job Fai...
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Hadoop Crash Course Hadoop Summit SJ
Native erasure coding support inside hdfs presentation
図でわかるHDFS Erasure Coding
HDFS Deep Dive
Hadoop Workshop on EC2 : March 2015
Multi User Data science with Zeppelin
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
Timeline Service v.2 (Hadoop Summit 2016)
Apache Zeppelin Helium and Beyond
Open Source Ingredients for Interactive Data Analysis in Spark
Hadoop crashcourse v3
Performance comparison of Distributed File Systems on 1Gbit networks
What's new in hadoop 3.0
HDFS Erasure Coding in Action
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
Hadoop crash course workshop at Hadoop Summit
Automation of Rolling Upgrade of Hadoop Cluster without Data Lost and Job Fai...
Ad

Similar to Data Science Crash Course Hadoop Summit SJ (20)

PDF
Big Data Workshop: Splunk and Dell EMC...Better Together
PDF
Machine Learning for Startups without PhDs
PDF
Machine Learning for Startups without PhDs
PPTX
Building Powerful and Intelligent Applications with Azure Machine Learning
PDF
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)
PPTX
Biwa summit 2015 oaa oracle data miner hands on lab
PPTX
Machine Learning for Data Extraction
PDF
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
PPTX
Planning a Strategy for Autonomous Analytics and Data Warehousing
PPTX
Building Powerful and Intelligent Applications with Azure Machine Learning
PDF
Lean Analytics: How to get more out of your data science team
PDF
The Machine Learning behind the Autonomous Database- EMEA Tour Oct 2019
PDF
[db tech showcase Tokyo 2018] #dbts2018 #B27 『Discover Machine Learning and A...
PDF
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
PPTX
DevOpsDays Amsterdam 2016 workshop
PDF
DevOps for DataScience
PPTX
Getting Started with Splunk Breakout Session
PDF
Data science workshop
PDF
Agile data science
PDF
Ncku csie talk about Spark
Big Data Workshop: Splunk and Dell EMC...Better Together
Machine Learning for Startups without PhDs
Machine Learning for Startups without PhDs
Building Powerful and Intelligent Applications with Azure Machine Learning
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)
Biwa summit 2015 oaa oracle data miner hands on lab
Machine Learning for Data Extraction
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Planning a Strategy for Autonomous Analytics and Data Warehousing
Building Powerful and Intelligent Applications with Azure Machine Learning
Lean Analytics: How to get more out of your data science team
The Machine Learning behind the Autonomous Database- EMEA Tour Oct 2019
[db tech showcase Tokyo 2018] #dbts2018 #B27 『Discover Machine Learning and A...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
DevOpsDays Amsterdam 2016 workshop
DevOps for DataScience
Getting Started with Splunk Breakout Session
Data science workshop
Agile data science
Ncku csie talk about Spark

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Machine learning based COVID-19 study performance prediction
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Big Data Technologies - Introduction.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
KodekX | Application Modernization Development
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Dropbox Q2 2025 Financial Results & Investor Presentation
Machine learning based COVID-19 study performance prediction
Per capita expenditure prediction using model stacking based on satellite ima...
MYSQL Presentation for SQL database connectivity
Big Data Technologies - Introduction.pptx
Unlocking AI with Model Context Protocol (MCP)
Chapter 3 Spatial Domain Image Processing.pdf
Encapsulation theory and applications.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Encapsulation_ Review paper, used for researhc scholars
KodekX | Application Modernization Development
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Spectral efficient network and resource selection model in 5G networks
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Diabetes mellitus diagnosis method based random forest with bat algorithm

Data Science Crash Course Hadoop Summit SJ