SlideShare a Scribd company logo
Podling Hivemall	in	the	Apache	
Incubator
Research	Engineer
Makoto	YUI	@myui
<myui@treasure-data.com>
12016/11/08	Apache	Hadoop	Meetup	at	CWT	2016
2016/11/08	Apache	Hadoop	Meetup	at	CWT	2016 2
Hivemall	entered	Apache	Incubator	
on	Sept	13,	2016	 🎉
hivemall.incubator.apache.org
@ApacheHivemall
• Makoto	Yui	<Treasure	Data>
• Takeshi	Yamamuro <NTT>
Ø Hivemall	on	Apache	Spark
• Daniel	Dai	<Hortonworks>
Ø Hivemall	on	Apache	Pig	
Ø Apache	Pig	PMC	member
• Tsuyoshi	Ozawa	<NTT>
ØApache	Hadoop	PMC	member
• Kai	Sasaki	<Treasure	Data>
3
Initial	committers
2016/11/08	Apache	Hadoop	Meetup	at	CWT	2016
Champion
Nominated	Mentors
4
Project	mentors
• Reynold	Xin	<Databricks,	ASF	member>
Apache	Spark	PMC	member
• Markus	Weimer	<Microsoft,	ASF	member>
Apache	REEF	PMC	member
• Xiangrui Meng <Databricks,	ASF	member>
Apache	Spark	PMC	member
• Roman	Shaposhnik <Pivotal,	ASF	member>
Apache	Bigtop/Incubator	PMC	member
2016/11/08	Apache	Hadoop	Meetup	at	CWT	2016
What	is	Apache	Hivemall
Scalable	machine	learning	library	
built	as	a	collection	of	Hive	UDFs
52016/11/08	Apache	Hadoop	Meetup	at	CWT	2016
Multi/Cross	
platform Versatile Scalable Ease-of-use
Hivemall	is	easy	and	scalable	…
Classification	with	Mahout
CREATE	TABLE	lr_model AS
SELECT
feature,	-- reducers	perform	model	averaging	in	
parallel
avg(weight)	as	weight
FROM	(
SELECT	logress(features,label,..)	as	(feature,weight)
FROM	train
)	t	-- map-only	task
GROUP	BY	feature;	-- shuffled	to	reducers
ML	made	easy	for	SQL	developers
Born	to	be	parallel	and	scalable
This	SQL	query	automatically	runs	in	
parallel	on	Hadoop	cluster
62016/11/08	Apache	Hadoop	Meetup	at	CWT	2016
Ease-of-use
Scalable
2016/11/08	Apache	Hadoop	Meetup	at	CWT	2016 7
Hivemall	is	a	multi/cross-platform
ML	library
HiveQL SparkSQL/Dataframe API Pig	Latin
Hivemall	is	Multi/Cross	platform	..
Multi/Cross	
platform
prediction	models	built	by	Hive	can	be	used	from	Spark,	and	
conversely,	prediction	models	build	by	Spark	can	be	used	from	Hive
2016/11/08	Apache	Hadoop	Meetup	at	CWT	2016 8
Hivemall	on	Apache	Hive
2016/11/08	Apache	Hadoop	Meetup	at	CWT	2016 9
Hivemall	on	Apache	Spark	Dataframe
2016/11/08	Apache	Hadoop	Meetup	at	CWT	2016 10
Hivemall	on	SparkSQL
2016/11/08	Apache	Hadoop	Meetup	at	CWT	2016 11
Hivemall	on	Apache	Pig
2016/11/08	Apache	Hadoop	Meetup	at	CWT	2016 12
Versatile
Hivemall	is	a	Versatile	library	..
ü Hivemall	is	not	only	for	Machine	
Learning
ü Hivemall	provides	bunch	of	generic	
utility	functions	(e.g.,	top-k,	NLP)
Each	organization	has	own	sets	
of	UDFs	for	data	preprocessing!
Don’t	Repeat	Yourself!
Don’t	Repeat	Yourself!
Conclusion	and	Takeaway
Hivemall	is	a	machine	learning	library	that	is	…
2016/11/08	Apache	Hadoop	Meetup	at	CWT	2016 13
We	welcome	your	contributions	to	Apache	Hivemall	J
Multi/Cross	
platform
Versatile Scalable Ease-of-use
hivemall.incubator.apache.org

More Related Content

PDF
HadoopCon'16, Taipei @myui
PDF
Apache Hivemall @ Apache BigData '17, Miami
PDF
Hadoopsummit16 myui
PDF
3rd Hivemall meetup
PDF
Hivemall talk@Hadoop summit 2014, San Jose
PPTX
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
PPTX
Data Science with Spark & Zeppelin
PDF
Introduction to Hivemall
HadoopCon'16, Taipei @myui
Apache Hivemall @ Apache BigData '17, Miami
Hadoopsummit16 myui
3rd Hivemall meetup
Hivemall talk@Hadoop summit 2014, San Jose
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
Data Science with Spark & Zeppelin
Introduction to Hivemall

What's hot (20)

PDF
The Apache Way - Building Open Source Community in China - Luke Han
PDF
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
PDF
The Evolution of Apache Kylin by Luke Han
PPTX
SparkR + Zeppelin
PDF
Fast, Scalable Graph Processing: Apache Giraph on YARN
PDF
An Incomplete Data Tools Landscape for Hackers in 2015
PPT
Hw09 Hadoop Applications At Yahoo!
PDF
Extending Pandas using Apache Arrow and Numba
PDF
SystemML - Declarative Machine Learning
PDF
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
PDF
PyCon Singapore 2013 Keynote
PDF
Luciano Resende's keynote at Apache big data conference
PDF
My Data Journey with Python (SciPy 2015 Keynote)
PPTX
Apache Kylin Introduction
PDF
Ibis: Scaling the Python Data Experience
PDF
Spark Summit EU talk by Dean Wampler
PPTX
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
PDF
Apache Arrow: Cross-language Development Platform for In-memory Data
PDF
Improving data interoperability in Python and R
PPTX
The Apache Way - Building Open Source Community in China - Luke Han
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
The Evolution of Apache Kylin by Luke Han
SparkR + Zeppelin
Fast, Scalable Graph Processing: Apache Giraph on YARN
An Incomplete Data Tools Landscape for Hackers in 2015
Hw09 Hadoop Applications At Yahoo!
Extending Pandas using Apache Arrow and Numba
SystemML - Declarative Machine Learning
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
PyCon Singapore 2013 Keynote
Luciano Resende's keynote at Apache big data conference
My Data Journey with Python (SciPy 2015 Keynote)
Apache Kylin Introduction
Ibis: Scaling the Python Data Experience
Spark Summit EU talk by Dean Wampler
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Apache Arrow: Cross-language Development Platform for In-memory Data
Improving data interoperability in Python and R
Ad

Viewers also liked (20)

PDF
機械学習のデータ並列処理@第7回BDI研究会
PDF
HivemallとSpark MLlibの比較
PDF
知って得するかもしれない研究Tips
PDF
餃子構造論
PPTX
ISDL第一回LT(kuhataku)
PDF
Hivemall v0.3の機能紹介@1st Hivemall meetup
PDF
Tdtechtalk20160425myui
PPTX
Open-TD: オープンサイエンス時代の社会協働研究
PDF
Dots20161029 myui
PDF
トレジャーデータ株式会社について(for all Data_Enthusiast!!)
PDF
事例で学ぶトレジャーデータ 20140612
PDF
Treasure Data × Wave Analytics EC Demo
PDF
この Visualization がすごい2014 〜データ世界を彩るツール6選〜
PPTX
モノタロウが トレジャーデータを使う理由と、 データを活かす企業文化
PDF
トレジャーデータ流,データ分析の始め方
PDF
機械学習モデルフォーマットの話:さようならPMML、こんにちはPFA
PDF
スマホ動向 20150219 配布用
PDF
Ibis: すごい pandas ⼤規模データ分析もらっくらく #summerDS
PPTX
E2D3 introduction
PDF
トレジャーデータのバッチクエリとアドホッククエリを理解する
機械学習のデータ並列処理@第7回BDI研究会
HivemallとSpark MLlibの比較
知って得するかもしれない研究Tips
餃子構造論
ISDL第一回LT(kuhataku)
Hivemall v0.3の機能紹介@1st Hivemall meetup
Tdtechtalk20160425myui
Open-TD: オープンサイエンス時代の社会協働研究
Dots20161029 myui
トレジャーデータ株式会社について(for all Data_Enthusiast!!)
事例で学ぶトレジャーデータ 20140612
Treasure Data × Wave Analytics EC Demo
この Visualization がすごい2014 〜データ世界を彩るツール6選〜
モノタロウが トレジャーデータを使う理由と、 データを活かす企業文化
トレジャーデータ流,データ分析の始め方
機械学習モデルフォーマットの話:さようならPMML、こんにちはPFA
スマホ動向 20150219 配布用
Ibis: すごい pandas ⼤規模データ分析もらっくらく #summerDS
E2D3 introduction
トレジャーデータのバッチクエリとアドホッククエリを理解する
Ad

Similar to Podling Hivemall in the Apache Incubator (20)

PPTX
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
PDF
Apache Pulsar Community-Jennifer
ODP
Supporting Apache Brands While Making A Profit - ApacheCon 2014
PPTX
Why contribute to open source projects
PDF
Getting involved with Open Source at the ASF
PDF
Profiting From Apache Brands Without Losing Your Soul
PPTX
How to Use Apache Zeppelin with HWX HDB
PDF
Successfully Profiting From Apache Brands
PDF
Apache Kylin Open Source Journey for QCon2015 Beijing
PPTX
Apache Ambari Community Activity Overview (June 2018)
PPTX
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
PDF
Shortening the feedback loop
PDF
Python Data Wrangling: Preparing for the Future
PPTX
HBaseCon 2013: General Session
KEY
Apache Rave at The Apache Meetup in NL
PDF
Dataflow with Apache NiFi - Crash Course - HS16SJ
PDF
Apache NiFi Crash Course San Jose Hadoop Summit
PPTX
A (XPages) developers guide to Cloudant
PPTX
A Power User's intro to jQuery awesomeness in SharePoint
PPTX
A Power User's Intro to jQuery Awesomeness in SharePoint
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Apache Pulsar Community-Jennifer
Supporting Apache Brands While Making A Profit - ApacheCon 2014
Why contribute to open source projects
Getting involved with Open Source at the ASF
Profiting From Apache Brands Without Losing Your Soul
How to Use Apache Zeppelin with HWX HDB
Successfully Profiting From Apache Brands
Apache Kylin Open Source Journey for QCon2015 Beijing
Apache Ambari Community Activity Overview (June 2018)
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Shortening the feedback loop
Python Data Wrangling: Preparing for the Future
HBaseCon 2013: General Session
Apache Rave at The Apache Meetup in NL
Dataflow with Apache NiFi - Crash Course - HS16SJ
Apache NiFi Crash Course San Jose Hadoop Summit
A (XPages) developers guide to Cloudant
A Power User's intro to jQuery awesomeness in SharePoint
A Power User's Intro to jQuery Awesomeness in SharePoint

More from Makoto Yui (20)

PDF
Apache Hivemall and my OSS experience
PDF
Introduction to Apache Hivemall v0.5.2 and v0.6
PDF
Introduction to Apache Hivemall v0.5.0
PDF
Idea behind Apache Hivemall
PDF
Introduction to Apache Hivemall v0.5.0
PDF
What's new in Hivemall v0.5.0
PDF
What's new in Apache Hivemall v0.5.0
PDF
Revisiting b+-trees
PDF
Incubating Apache Hivemall
PDF
Hivemall meets Digdag @Hackertackle 2018-02-17
PDF
Recommendation 101 using Hivemall
PDF
Hivemall dbtechshowcase 20160713 #dbts2016
PDF
Introduction to Hivemall
PDF
Tdtechtalk20160330myui
PDF
Datascientistsymp1113
PDF
2nd Hivemall meetup 20151020
PDF
Talk about Hivemall at Data Scientist Organization on 2015/09/17
PDF
Db tech show - hivemall
PDF
Hivemall tech talk at Redwood, CA
PDF
Hivemall Talk at TD tech talk #3
Apache Hivemall and my OSS experience
Introduction to Apache Hivemall v0.5.2 and v0.6
Introduction to Apache Hivemall v0.5.0
Idea behind Apache Hivemall
Introduction to Apache Hivemall v0.5.0
What's new in Hivemall v0.5.0
What's new in Apache Hivemall v0.5.0
Revisiting b+-trees
Incubating Apache Hivemall
Hivemall meets Digdag @Hackertackle 2018-02-17
Recommendation 101 using Hivemall
Hivemall dbtechshowcase 20160713 #dbts2016
Introduction to Hivemall
Tdtechtalk20160330myui
Datascientistsymp1113
2nd Hivemall meetup 20151020
Talk about Hivemall at Data Scientist Organization on 2015/09/17
Db tech show - hivemall
Hivemall tech talk at Redwood, CA
Hivemall Talk at TD tech talk #3

Recently uploaded (20)

PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Fluorescence-microscope_Botany_detailed content
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
1_Introduction to advance data techniques.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Introduction to machine learning and Linear Models
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
annual-report-2024-2025 original latest.
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Introduction to the R Programming Language
oil_refinery_comprehensive_20250804084928 (1).pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Introduction to Knowledge Engineering Part 1
STUDY DESIGN details- Lt Col Maksud (21).pptx
Reliability_Chapter_ presentation 1221.5784
Fluorescence-microscope_Botany_detailed content
Galatica Smart Energy Infrastructure Startup Pitch Deck
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
ISS -ESG Data flows What is ESG and HowHow
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
climate analysis of Dhaka ,Banglades.pptx
.pdf is not working space design for the following data for the following dat...
1_Introduction to advance data techniques.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Introduction to machine learning and Linear Models
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
annual-report-2024-2025 original latest.
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Introduction to the R Programming Language

Podling Hivemall in the Apache Incubator