SlideShare a Scribd company logo
>>>>>>>>>>>>>>>>>>>>>
CREATING THE FUTURE
OF BIG DATA THROUGH
"THE APACHE WAY”
WHY THIS MATTERS TO THE
COMMUNITY
Dr. Justin R. Erenkrantz, Bloomberg LP
justin@erenkrantz.com / @jerenkrantz
WHY SHOULD I PAY ATTENTION?
»  Mentor	to	Apache	Geode	and	HAWQ	
»  Commi5er	to	Apache	HTTP	Server,	APR,	Subversion,	Serf	
»  Former	President	and	Director	of	The	Apache	SoBware	
FoundaDon	
»  Ph.D.	from	University	of	California,	Irvine	
»  DissertaDon:	"ComputaDonal	REST:	A	New	Model	for	
Decentralized,	Internet-Scale	ApplicaDons”	
»  Head	of	Compute	Architecture	at	Bloomberg	LP	
»  ~50	billion	Dcks	DAILY	flow	through	our	systems	
2
TECH @ BLOOMBERG: OPEN SOURCE
3	
»  The	core	of	our	Bloomberg	Professional	plaorm	has	evolved	away	
from	proprietary	code	
»  FoundaDons	of	our	next-generaDon	infrastructure	-	OpenStack,	Ceph,	
Hadoop,	Spark,	Solr,	Chromium,	Chef	-	are	all	open-source	
»  No	longer	can	vendors	tell	us	that	they	won’t	fix	a	criDcal	bug	
»  Places	a	lot	of	pressure	on	our	partners	to	collaborate	openly	
»  Giving	back	to	the	community	-	h"ps://github.com/bloomberg/	
»  Allows	us	to	innovate	at	the	higher	levels	–	helping	our	customers	
make	sense	of	the	firehose	of	informaDon	that	is	available	to	them
TECH @ BLOOMBERG: OPEN CAN BE HARDWARE TOO!
4
HISTORY LESSON…
5	
» Started	as	Apache	Group	with	8	members	in	Feb	1995	
resuming	work	on	NCSA	h5pd	
» UIUC	placed	the	server	code	in	public	domain	
» Most	of	the	UIUC	team	leB	to	join	Netscape	
» Webmasters	leB	in	the	lurch	and	joined	together	
» The	Apache	SoBware	FoundaDon	incorporated	in	1999	
» Today,	there	are	over	350	communiDes	affiliated	with	
Apache	performing	over	16,000	code	commits/month	
Why?
PHILOSOPHY OF THE APACHE SOFTWARE FOUNDATION
6	
» Let	the	contributors	do	what	they	do	best:	contribute.		
FoundaDon	exists	to	do	the	rest.	
» Does	not	pay	for	contribuDons	
» Many	are	sponsored	by	a	third-party	
» Staff	ASF	has	are	focused	on	infrastructure/PR/etc	
» Does	not	pick	“winners”	or	“losers”	
» “CompeDDon”	between	ASF	projects	perfectly	
acceptable	as	long	as	there	are	healthy	
communiDes…think	Geode	and	Ignite	(!)
ANTI-PHILOSOPHY
7	
» “The	Apache	Way”	is	not…	
» Dumping	your	code	on	GitHub	
» Single-sponsor	contribuDons	
» Running	a	Benevolent	Dictatorship	(BDFL)	
» The	Apache	SoBware	FoundaDon	may	not	be	best	for	
all	projects...that’s	perfectly	OK.	
» If	you	wish	to	be	part	of	Apache,	you	need	to	adhere	to	
social	constructs	and	norms	
» Technical	decisions	are	up	to	the	community	to	decide
ROLE OF APACHE INCUBATOR
8	
» Each	project	(TLP)	is	run	relaDvely	autonomously	
» Project	karma	does	not	automaDcally	carry	over	
» If	I	can	commit	to	Geode,	it	doesn’t	mean	I	can	
commit	to	Ignite!		(But,	I	could	likely	earn	it	easily!)	
» Incubator	was	formed	in	2003	as	we	were	struggling	to	
scale	the	foundaDon	and	repeat	the	model.		It	worked.	
» If	a	podling	does	not	have	a	healthy	community,	it’ll	
never	graduate.		That’s	OK.		If	the	podling	does	
become	a	TLP,	but	later	loses	its	community,	it’ll	end	
up	in	the	Arc.		That’s	OK,	too.
TRANSPARENCY & MERITOCRACY
9	
»  Roy’s Mantra: "If it's not on the list, it didn't happen.”
»  Apache in the age of GitHub, JIRA, ReviewBoard, etc.
»  Is the mailing list doomed?
»  Generation gap may mean email isn’t preferred
»  Tools are always secondary to process
»  Transparency is the aim: allows others to have a voice
»  The tools and process are never about prohibiting face-to-
face contact - but, ensuring that there is equal access for
participation and permitting asynchronous decision making
»  Making decisions in a synchronous echo chamber (Slack,
IRC, etc.) is not conducive to transparency
MAKING DECISIONS
10	
»  Voting is the way contributors are (and feel) empowered
»  “Binding” votes from recognized contributors (PMC)
»  Vote on code, ideas, and, most importantly, releases
»  Minimum acceptable quorum: 3 voters
»  Minimum acceptable time frame: 72 hours
»  The power of the dreaded “-1” (veto)
»  Code can be vetoed, but not releases
»  Veto should be cast as a last resort; used to foster
discussion
GROWING COMMUNITY
11	
» ContribuDons	can	come	from	anywhere
» Relies	upon	core	contributors	being	open	to	ideas
» Yet,	there	oBen	is	a	set	of	agreed	upon	principles	
» Going	to	Geode	community	and	say	that	you	should	
remove	all	consistency	code	is	a	non-starter	
» This	is	the	power	of	the	mythical	"The	Apache	Way”	
» Meritocracy:	access	based	on	demonstrated	skills	
» Michael	Young's	The	Rise	of	the	Meritocracy	(1958)	
–	negaDve	connotaDons	across	an	enDre	society
GROWING COMMUNITY
12	
» As	a	downstream	consumer	of	Apache	projects,	will	
there	be	someone	who	is	maintaining	the	code	base?		
Can	I	help	volunteer	to	maintain	it?	
» A	codebase	by	itself	is	inert	
» Code	is	never	perfect,	but	a	healthy	and	inclusive	
community	will	be	improving	the	code	constantly	
based	upon	feedback	and	others	
» “Community	over	Code”
ROLES IN INCUBATOR
13	
» Think	of	a	podling	as	being	provided	a	set	of	training	
wheels	as	they	learn	the	rules	of	the	road.	
» Required	quarterly	reporDng	is	one	of	the	few	
mechanisms	that	the	Board	imposes	to	all	projects	to	
ensure	that	the	community	is	healthy.	
» If	no	one	submits	the	report,	no	one	may	be	home!	
» Mentors	are	around	to	answer	quesDons,	share	
knowledge	,	and	best	pracDces.		Mentors	are	not	there	
to	contribute	code	–	though,	oBen	we	could;	but,	that	
role	is	disDnct.
NORMS OF THE COMMUNITY
14	
» Over	the	years,	most	disputes	I	have	seen	come	down	
to	norms	that	were	not	agreed	upon	or	documented	
» Forming	an	explicit	consensus	on	release	versioning	
and	compaDbility	rules	up-front	is	so	incredibly	helpful.	
» Projects	always	have	a	tension	between	“new	
features”	and	compaDbility.	Decide	where	the	
community	wants	to	be	early	on.	
» The	Geode	wiki	secDon	is	great.		Keep	it	up!
EXPECTATIONS FOR CONTRIBUTORS
15	
» Explicitly	communicaDng	to	contributors	who	are	not	
yet	in	PMC	what	the	expectaDons	are	for	receiving	
commit	access	(vote)	to	a	project	is	extremely	helpful.	
» It’s	painful	to	see	contributors	who	do	not	feel	
empowered	by	the	community.		It’s	a	huge	red	flag.	
» Each	project	can	and	should	set	its	own	bar.	
» My	gut	feeling	now	is	to	err	on	the	side	of	inclusiveness	
and	give	commit	rights	earlier	than	I	did.		It’s	all	under	
version	control	anyway.		Worst	case,	revoke	that	
person’s	bit.
GRADUATION
16	
» When	will	Apache	Geode	graduate	from	Incubator	?	
» “	When	it's	ready”	is	the	only	honest	answer.	
» Geode	community	needs	to	demonstrate	that	it	can	
govern	itself	and	be	inclusive	and	transparent		
» It	doesn’t	have	to	be	perfect	–	no	community	is.	
» This	is	where	the	Board	can	be	extremely	helpful.	
» I	am	extremely	happy	to	see	the	progress	that	Geode	
has	made	so	far	and	wish	it	the	very	best	on	its	path.
Join the Apache Geode Community!
•  Check out: http://geode.incubator.apache.org
•  Subscribe: user-subscribe@geode.incubator.apache.org
•  Download: http://geode.incubator.apache.org/releases/
18	
THANKS!
Dr. Justin R. Erenkrantz, Bloomberg LP
justin@erenkrantz.com / @jerenkrantz

More Related Content

PDF
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
PDF
#GeodeSummit - Where Does Geode Fit in Modern System Architectures
PDF
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
PDF
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
PDF
#GeodeSummit - Design Tradeoffs in Distributed Systems
PDF
Data science lifecycle with Apache Zeppelin
PPTX
Solr + Hadoop: Interactive Search for Hadoop
PDF
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
#GeodeSummit - Where Does Geode Fit in Modern System Architectures
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
#GeodeSummit - Design Tradeoffs in Distributed Systems
Data science lifecycle with Apache Zeppelin
Solr + Hadoop: Interactive Search for Hadoop
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz

What's hot (20)

PDF
Docker. Does it matter for Java developer ?
PDF
Akka and AngularJS – Reactive Applications in Practice
PDF
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...
PDF
A Journey to Reactive Function Programming
PDF
Spark Summit EU talk by William Benton
PDF
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
PDF
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
PDF
IMCSummit 2015 - Day 2 Developer Track - Anatomy of an In-Memory Data Fabric:...
PDF
Helium makes Zeppelin fly!
PDF
Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue
PDF
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
PPTX
Kafka On YARN (KOYA): An Open Source Initiative to integrate Kafka & YARN
PDF
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
PDF
Whirlpools in the Stream with Jayesh Lalwani
PPTX
Ignite Your Big Data With a Spark!
PDF
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
PDF
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
PPTX
Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
PDF
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
PDF
Riak at shareaholic
Docker. Does it matter for Java developer ?
Akka and AngularJS – Reactive Applications in Practice
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...
A Journey to Reactive Function Programming
Spark Summit EU talk by William Benton
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
IMCSummit 2015 - Day 2 Developer Track - Anatomy of an In-Memory Data Fabric:...
Helium makes Zeppelin fly!
Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Kafka On YARN (KOYA): An Open Source Initiative to integrate Kafka & YARN
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Whirlpools in the Stream with Jayesh Lalwani
Ignite Your Big Data With a Spark!
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Riak at shareaholic
Ad

Viewers also liked (18)

PDF
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...
PDF
#GeodeSummit - Redis to Geode Adaptor
PDF
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
PDF
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
PDF
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
PDF
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
PPTX
#GeodeSummit - Spring Data GemFire API Current and Future
PPTX
#GeodeSummit - Off-Heap Storage Current and Future Design
PDF
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
PPTX
Individual and societal risk
PPTX
REDES NEURONALES
PDF
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
PPT
Presentación de Moodle
PDF
Data flow vs. procedural programming: How to put your algorithms into Flink
PPT
Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis
DOCX
El cambio
PDF
Al prayer-and-quality-of-life
PDF
Realtime Apache Hadoop at Facebook
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...
#GeodeSummit - Redis to Geode Adaptor
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
Individual and societal risk
REDES NEURONALES
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Presentación de Moodle
Data flow vs. procedural programming: How to put your algorithms into Flink
Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis
El cambio
Al prayer-and-quality-of-life
Realtime Apache Hadoop at Facebook
Ad

Similar to #GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way" (20)

PDF
Brandon
PDF
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
PDF
Build agile and elastic data pipeline
PDF
Intro to H2O Machine Learning in R at Santa Clara University
PDF
Cytoscape: Now and Future
PDF
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
PDF
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
PDF
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
PDF
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
PDF
Building a Data Pipeline from Scratch - Joe Crobak
PPTX
Drupal In The Cloud
PDF
New Developments in H2O: April 2017 Edition
PDF
How to Launch a Public PaaS with OpenSource: The GetUpCloud & OpenShift Orgin...
PPTX
Conceptualizing And Prototyping A Scalable Genomic Data Analysis Pipeline: Us...
PPTX
osi-oss-dbs.pptx
PDF
H2O Rains with Databricks Cloud - NY 02.16.16
PDF
High Performance Machine Learning in R with H2O
PDF
Beyond static configuration
PDF
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
PPTX
OpenACC Monthly Highlights - March 2018
Brandon
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Build agile and elastic data pipeline
Intro to H2O Machine Learning in R at Santa Clara University
Cytoscape: Now and Future
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Building a Data Pipeline from Scratch - Joe Crobak
Drupal In The Cloud
New Developments in H2O: April 2017 Edition
How to Launch a Public PaaS with OpenSource: The GetUpCloud & OpenShift Orgin...
Conceptualizing And Prototyping A Scalable Genomic Data Analysis Pipeline: Us...
osi-oss-dbs.pptx
H2O Rains with Databricks Cloud - NY 02.16.16
High Performance Machine Learning in R with H2O
Beyond static configuration
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
OpenACC Monthly Highlights - March 2018

More from PivotalOpenSourceHub (15)

PPTX
Zettaset Elastic Big Data Security for Greenplum Database
PPTX
New Security Framework in Apache Geode
PPTX
Apache Geode Clubhouse - WAN-based Replication
PDF
Building Apps with Distributed In-Memory Computing Using Apache Geode
PPTX
GPORCA: Query Optimization as a Service
PDF
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
PPTX
Apache Geode Offheap Storage
PPTX
Apache Zeppelin Meetup Christian Tzolov 1/21/16
PPTX
Build & test Apache Hawq
PDF
Postgre sql linuxcontainers by Jignesh Shah
PPTX
kafka for db as postgres
PPTX
Geode Transactions by Swapnil Bawaskar
PPTX
Greenplum Database Open Source December 2015
PPTX
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
PDF
Data Science Perspective and DS demo
Zettaset Elastic Big Data Security for Greenplum Database
New Security Framework in Apache Geode
Apache Geode Clubhouse - WAN-based Replication
Building Apps with Distributed In-Memory Computing Using Apache Geode
GPORCA: Query Optimization as a Service
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Apache Geode Offheap Storage
Apache Zeppelin Meetup Christian Tzolov 1/21/16
Build & test Apache Hawq
Postgre sql linuxcontainers by Jignesh Shah
kafka for db as postgres
Geode Transactions by Swapnil Bawaskar
Greenplum Database Open Source December 2015
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
Data Science Perspective and DS demo

Recently uploaded (20)

PPTX
sap open course for s4hana steps from ECC to s4
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
cuic standard and advanced reporting.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Spectroscopy.pptx food analysis technology
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
sap open course for s4hana steps from ECC to s4
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Reach Out and Touch Someone: Haptics and Empathic Computing
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Understanding_Digital_Forensics_Presentation.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Unlocking AI with Model Context Protocol (MCP)
cuic standard and advanced reporting.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
20250228 LYD VKU AI Blended-Learning.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Encapsulation_ Review paper, used for researhc scholars
Spectroscopy.pptx food analysis technology
The AUB Centre for AI in Media Proposal.docx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy

#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"