SlideShare a Scribd company logo
Apache Geode,
and Pivotal's leadership role
in open sourcing (Gemfire)
Nitin Lamba
(incubating)
Pivotal’s Open Source strategy
What is Apache Geode?
History
Differentiators
Basic Concepts
Resources
Q & A
Agenda
2
3
4
In 2015, Pivotal granted the components of its Big Data Suite to
open source
6 Million Lines of Code
4 new open source communities
5
May 2015 Sept 2015
Sept 2015Oct 2015
From GEMFIRE to GEODE…
6
A distributed, memory-based data
management platform for data
oriented apps that need:
• high performance, scalability,
resiliency and continuous
availability
• fast access to critical data sets
• location-aware distributed data
processing
• event-driven data architecture
What is GEODE?
7
• 1000+ systems in production (real customers)
• Cutting edge use cases
Incubating but ROCK solid…
8
<2000 2004 2008 2012 2016
Early drivers
• Data Volumes
• Margins/ transactions
• IT maintenance costs
• Elasticity needs
Real-time needs
• Real-timeresponse
• Time to market needs
• Flexible Data Models
• Persistent+In-memory
Global Data
• Visibility across DC
• Fast Ingest
• Device to enterprise
• Uptime (always on)
Open Source!
• Apache Incubation
• Gemfire > Geode
• Geode M1 release
• 1st Geode Summit
Financial
Services
US DoD
Trade Clearing
Travel Portal
Online
Gambling
Telcos
Manufacturing
Auto Insurance
Payroll processing
Rail systems
…with both SCALE and SPEED, …
9
40K
Transactions
per second
3TB
Data
in-memory
17B
Records
in-memory
120K
Concurrent
users
… and impacting a LOT of people!
10
China Railway
Corporation
Indian
Railways
17%
19%
36%
of the world population
High-level Architecture
11
Powerful app development kit
• APIs: Java & REST
• Adapters: Redis, Lucene*, Spark*, …
Multiple persistence options
• Filesystem, RDBMS or HDFS*
• Sync: read-through, write-through
• Async: write-behind
Durable <K,V> cache/ store
• Data replicated or partitioned
• Redundant storage in-memory/ disk
• Flexible data retention policies
Î
!
Locator
Server
Server
Server
Server
+""""
" 
$
%
%
%
&& &
% % % % % % % %
&&
A Peer-2-Peer in-memory
Distributed System
REST
!
* Experimental and waiting community feedback
• Minimize copying
• Minimize contention points
• Run user code in-process
• Partitioning & parallelism
• Avoid disk seeks
• Automated benchmarks
What makes it go FAST?
12
• Cache
• Region
• Member
• Client Cache
• Persistence
• Functions
Let’s talk about a few BASIC CONCEPTS…
13
• In-memory storage and
management for your data
• Configurable through XML,
Java API or CLI
• Collection of Region
What is a CACHE?
14
• Distributed java.util.Map on
steroids (Key/Value)
• Consistent API regardless of
where or how data is stored
• Observable (reactive)
• Highly available, redundant on
cache Member (s).
What is a REGION?
15
• Local, Replicated or Partitioned
• In-memory or persistent
• Redundant
• LRU
• Overflow
Region: Types & Options
16
LOCAL
LOCAL_HEAP_LRU
LOCAL_OVERFLOW
LOCAL_PERSISTENT
LOCAL_PERSISTENT_OVERFLOW
PARTITION
PARTITION_HEAP_LRU
PARTITION_OVERFLOW
PARTITION_PERSISTENT
PARTITION_PERSISTENT_OVERFLOW
PARTITION_PROXY
PARTITION_PROXY_REDUNDANT
PARTITION_REDUNDANT
PARTITION_REDUNDANT_HEAP_LRU
PARTITION_REDUNDANT_OVERFLOW
PARTITION_REDUNDANT_PERSISTENT
PARTITION_REDUNDANT_PERSISTENT_OVERFLOW
REPLICATE
REPLICATE_HEAP_LRU
REPLICATE_OVERFLOW
REPLICATE_PERSISTENT
REPLICATE_PERSISTENT_OVERFLOW
REPLICATE_PROXY
• Durability
• WAL for efficient writing
• Consistent recovery
• Compaction
Persistent Regions
17
Server 1 Server N
• A process that has a connection to
the system
• A process that has created a cache
• Embeddable within your
application
What is a MEMBER?
18
Client
Locator
Server
• A process connected to the
Geode server(s)
• Can have a local copy of the data
• Run OQL queries on local data
• Can be notified about events on
the servers
What is a CLIENT CACHE?
19
Persistence - Shared Nothing
20
Server 3Server 2Server 1
Persistence - Shared Nothing
21
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2
Primary
Secondary
Persistence - Shared Nothing
22
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2
Primary
Secondary
Persistence - Shared Nothing
23
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2
Primary
Secondary
Persistence - Shared Nothing
24
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2
Primary
Secondary
B3
B2
Server 1 waits for others when it starts
Persistence - Shared Nothing
25
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2
Primary
Secondary
Fetches missed operations on restart
Persistence - Operational Logs
26
Create
k1->v1
Create
k2->v2
Modify
k1->v3
Create
k4->v4
Modify
k1->v5
Create
k6->v6
Member 1
Put k6->v6
Oplog2.crf
Oplog1.crf
Append to
operation log
Persistence - Operational Logs: Compaction
27
Create
k1->v1
Create
k2->v2
Modify
k1->v3
Create
k4->v4
Modify
k1->v5
Create
k6->v6
Member 1
Put k6->v6
Oplog2.crf
Oplog1.crf
Append to
operation log
Copy live
data forward
• Used for distributed concurrent
processing
(Map/Reduce, stored procedure)
• Highly available
• Data oriented
• Member oriented
Functions
28
Functions
29
30
• Check out: http://geode.incubator.apache.org
• Subscribe: user-subscribe@geode.incubator.apache.org
• Download: http://geode.incubator.apache.org/releases/
Join the Community!
31
Thank you!
Additional Slides
32
Built for PERFORMANCE…
33
0
200,000
400,000
600,000
800,000
1,000,000
AReads
AUpdates
BReads
BUpdates
CReads
DInserts
DReads
FReads
FUpdates
Operationspersecond
YCSB Workloads
Cassandra Geo
…and horizontal,consistent SCALABILITY!
34
Horizontal scaling for reads, consistent latency and CPU
0.
4.5
9.
13.5
18.
0.
1.25
2.5
3.75
5.
6.25
2 4 6 8 10
Speedup
Server	Hosts
speedup latency	(ms) CPU	%
• Scaled from 256 clients and 2 servers to 1280 clients and 10 servers
• Partitionedregion with redundancy and 1K data size
High Availability
35

More Related Content

DOCX
Nishant Resume 1 year
PPTX
Postgres Takes Charge Around the World
 
PDF
Oracle goldegate microservice
PDF
A Cloud Journey - Move to the Oracle Cloud
PPTX
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
PPTX
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
PDF
PCM18 (Big Data Analytics)
PDF
Utilizing BI 11g Reporting To Get The Most Out of P6
Nishant Resume 1 year
Postgres Takes Charge Around the World
 
Oracle goldegate microservice
A Cloud Journey - Move to the Oracle Cloud
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
PCM18 (Big Data Analytics)
Utilizing BI 11g Reporting To Get The Most Out of P6

What's hot (20)

PDF
Oracle MAA Best Practices - Applications Considerations
PDF
Oracle RAC - Roadmap for New Features
PPTX
Oracle GoldenGate Performance Tuning
PDF
Why to Use an Oracle Database?
PDF
20190704_AGIT_Georaster_ImageryData_KPatenge
PDF
Oracle Sharding 18c - Technical Overview
PPTX
Understanding Oracle GoldenGate 12c
PDF
Under the Hood of the Smartest Availability Features in Oracle's Autonomous D...
PPTX
Oracle business analytics best practices
PDF
Obiee 12C and the Leap Forward in Lifecycle Management
PDF
Best Practices for the Most Impactful Oracle Database 18c and 19c Features
PPTX
ODI 12c Installation and New Features
PPTX
Oracle Data Integrator
PDF
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
PPTX
Accelerating query processing with materialized views in Apache Hive
PDF
Oracle Autonomous Database - introducción técnica y hands on lab
PDF
HA, Scalability, DR & MAA in Oracle Database 21c - Overview
PDF
Exadata experiences: discussion
PDF
Under the Hood of the Smartest Availability Features in Oracle's Autonomous D...
PDF
The Top 5 Reasons to Deploy Your Applications on Oracle RAC
Oracle MAA Best Practices - Applications Considerations
Oracle RAC - Roadmap for New Features
Oracle GoldenGate Performance Tuning
Why to Use an Oracle Database?
20190704_AGIT_Georaster_ImageryData_KPatenge
Oracle Sharding 18c - Technical Overview
Understanding Oracle GoldenGate 12c
Under the Hood of the Smartest Availability Features in Oracle's Autonomous D...
Oracle business analytics best practices
Obiee 12C and the Leap Forward in Lifecycle Management
Best Practices for the Most Impactful Oracle Database 18c and 19c Features
ODI 12c Installation and New Features
Oracle Data Integrator
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
Accelerating query processing with materialized views in Apache Hive
Oracle Autonomous Database - introducción técnica y hands on lab
HA, Scalability, DR & MAA in Oracle Database 21c - Overview
Exadata experiences: discussion
Under the Hood of the Smartest Availability Features in Oracle's Autonomous D...
The Top 5 Reasons to Deploy Your Applications on Oracle RAC
Ad

Similar to Pivotal's effort on Apache Geode (20)

PPTX
ApexMeetup Geode - Talk1 2016-03-17
PDF
Introduction to Apache Geode (Cork, Ireland)
PDF
Apache Geode Meetup, Cork, Ireland at CIT
PDF
Apache Geode Meetup, London
PPTX
Building Highly Scalable Spring Applications using In-Memory Data Grids
PPTX
Geode Meetup Apachecon
PPTX
Introducing Apache Geode and Spring Data GemFire
PPTX
Apache Geode (incubating) Introduction with Docker
PPTX
An Introduction to Apache Geode (incubating)
PPTX
Open Sourcing GemFire - Apache Geode
PDF
Geode is Not a Cache, it's an Analytics Engine
PDF
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
PDF
Building Apps with Distributed In-Memory Computing Using Apache Geode
PDF
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
PDF
Building Scalable Applications using Pivotal Gemfire/Apache Geode
PPTX
Geode introduction
POTX
Building Effective Apache Geode Applications with Spring Data GemFire
PDF
Session State Caching with Spring
PDF
Geode - Day 1
PDF
Implementing a highly scalable stock prediction system with R, Geode, SpringX...
ApexMeetup Geode - Talk1 2016-03-17
Introduction to Apache Geode (Cork, Ireland)
Apache Geode Meetup, Cork, Ireland at CIT
Apache Geode Meetup, London
Building Highly Scalable Spring Applications using In-Memory Data Grids
Geode Meetup Apachecon
Introducing Apache Geode and Spring Data GemFire
Apache Geode (incubating) Introduction with Docker
An Introduction to Apache Geode (incubating)
Open Sourcing GemFire - Apache Geode
Geode is Not a Cache, it's an Analytics Engine
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Building Apps with Distributed In-Memory Computing Using Apache Geode
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
Building Scalable Applications using Pivotal Gemfire/Apache Geode
Geode introduction
Building Effective Apache Geode Applications with Spring Data GemFire
Session State Caching with Spring
Geode - Day 1
Implementing a highly scalable stock prediction system with R, Geode, SpringX...
Ad

More from Apache Apex (20)

PDF
Low Latency Polyglot Model Scoring using Apache Apex
PDF
From Batch to Streaming with Apache Apex Dataworks Summit 2017
PDF
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
PDF
Developing streaming applications with apache apex (strata + hadoop world)
PDF
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
PDF
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
PPTX
Intro to Apache Apex @ Women in Big Data
PPTX
Deep Dive into Apache Apex App Development
PPTX
Hadoop Interacting with HDFS
PPTX
Introduction to Real-Time Data Processing
PPTX
Introduction to Apache Apex
PPTX
Introduction to Yarn
PPTX
Introduction to Map Reduce
PPTX
HDFS Internals
PPTX
Intro to Big Data Hadoop
PPTX
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
PPTX
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
PPTX
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
PPTX
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
PPTX
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Low Latency Polyglot Model Scoring using Apache Apex
From Batch to Streaming with Apache Apex Dataworks Summit 2017
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Developing streaming applications with apache apex (strata + hadoop world)
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Intro to Apache Apex @ Women in Big Data
Deep Dive into Apache Apex App Development
Hadoop Interacting with HDFS
Introduction to Real-Time Data Processing
Introduction to Apache Apex
Introduction to Yarn
Introduction to Map Reduce
HDFS Internals
Intro to Big Data Hadoop
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Ingesting Data from Kafka to JDBC with Transformation and Enrichment

Recently uploaded (20)

PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
sap open course for s4hana steps from ECC to s4
PPT
Teaching material agriculture food technology
PDF
cuic standard and advanced reporting.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Cloud computing and distributed systems.
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
KodekX | Application Modernization Development
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Unlocking AI with Model Context Protocol (MCP)
Understanding_Digital_Forensics_Presentation.pptx
Network Security Unit 5.pdf for BCA BBA.
sap open course for s4hana steps from ECC to s4
Teaching material agriculture food technology
cuic standard and advanced reporting.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Programs and apps: productivity, graphics, security and other tools
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Empathic Computing: Creating Shared Understanding
Cloud computing and distributed systems.
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
KodekX | Application Modernization Development
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Review of recent advances in non-invasive hemoglobin estimation
Chapter 3 Spatial Domain Image Processing.pdf
Encapsulation_ Review paper, used for researhc scholars
Diabetes mellitus diagnosis method based random forest with bat algorithm
Unlocking AI with Model Context Protocol (MCP)

Pivotal's effort on Apache Geode

  • 1. Apache Geode, and Pivotal's leadership role in open sourcing (Gemfire) Nitin Lamba (incubating)
  • 2. Pivotal’s Open Source strategy What is Apache Geode? History Differentiators Basic Concepts Resources Q & A Agenda 2
  • 3. 3
  • 4. 4 In 2015, Pivotal granted the components of its Big Data Suite to open source 6 Million Lines of Code 4 new open source communities
  • 5. 5 May 2015 Sept 2015 Sept 2015Oct 2015
  • 6. From GEMFIRE to GEODE… 6
  • 7. A distributed, memory-based data management platform for data oriented apps that need: • high performance, scalability, resiliency and continuous availability • fast access to critical data sets • location-aware distributed data processing • event-driven data architecture What is GEODE? 7
  • 8. • 1000+ systems in production (real customers) • Cutting edge use cases Incubating but ROCK solid… 8 <2000 2004 2008 2012 2016 Early drivers • Data Volumes • Margins/ transactions • IT maintenance costs • Elasticity needs Real-time needs • Real-timeresponse • Time to market needs • Flexible Data Models • Persistent+In-memory Global Data • Visibility across DC • Fast Ingest • Device to enterprise • Uptime (always on) Open Source! • Apache Incubation • Gemfire > Geode • Geode M1 release • 1st Geode Summit Financial Services US DoD Trade Clearing Travel Portal Online Gambling Telcos Manufacturing Auto Insurance Payroll processing Rail systems
  • 9. …with both SCALE and SPEED, … 9 40K Transactions per second 3TB Data in-memory 17B Records in-memory 120K Concurrent users
  • 10. … and impacting a LOT of people! 10 China Railway Corporation Indian Railways 17% 19% 36% of the world population
  • 11. High-level Architecture 11 Powerful app development kit • APIs: Java & REST • Adapters: Redis, Lucene*, Spark*, … Multiple persistence options • Filesystem, RDBMS or HDFS* • Sync: read-through, write-through • Async: write-behind Durable <K,V> cache/ store • Data replicated or partitioned • Redundant storage in-memory/ disk • Flexible data retention policies Î ! Locator Server Server Server Server +"""" "  $ % % % && & % % % % % % % % && A Peer-2-Peer in-memory Distributed System REST ! * Experimental and waiting community feedback
  • 12. • Minimize copying • Minimize contention points • Run user code in-process • Partitioning & parallelism • Avoid disk seeks • Automated benchmarks What makes it go FAST? 12
  • 13. • Cache • Region • Member • Client Cache • Persistence • Functions Let’s talk about a few BASIC CONCEPTS… 13
  • 14. • In-memory storage and management for your data • Configurable through XML, Java API or CLI • Collection of Region What is a CACHE? 14
  • 15. • Distributed java.util.Map on steroids (Key/Value) • Consistent API regardless of where or how data is stored • Observable (reactive) • Highly available, redundant on cache Member (s). What is a REGION? 15
  • 16. • Local, Replicated or Partitioned • In-memory or persistent • Redundant • LRU • Overflow Region: Types & Options 16 LOCAL LOCAL_HEAP_LRU LOCAL_OVERFLOW LOCAL_PERSISTENT LOCAL_PERSISTENT_OVERFLOW PARTITION PARTITION_HEAP_LRU PARTITION_OVERFLOW PARTITION_PERSISTENT PARTITION_PERSISTENT_OVERFLOW PARTITION_PROXY PARTITION_PROXY_REDUNDANT PARTITION_REDUNDANT PARTITION_REDUNDANT_HEAP_LRU PARTITION_REDUNDANT_OVERFLOW PARTITION_REDUNDANT_PERSISTENT PARTITION_REDUNDANT_PERSISTENT_OVERFLOW REPLICATE REPLICATE_HEAP_LRU REPLICATE_OVERFLOW REPLICATE_PERSISTENT REPLICATE_PERSISTENT_OVERFLOW REPLICATE_PROXY
  • 17. • Durability • WAL for efficient writing • Consistent recovery • Compaction Persistent Regions 17 Server 1 Server N
  • 18. • A process that has a connection to the system • A process that has created a cache • Embeddable within your application What is a MEMBER? 18 Client Locator Server
  • 19. • A process connected to the Geode server(s) • Can have a local copy of the data • Run OQL queries on local data • Can be notified about events on the servers What is a CLIENT CACHE? 19
  • 20. Persistence - Shared Nothing 20 Server 3Server 2Server 1
  • 21. Persistence - Shared Nothing 21 Server 3Server 2Server 1 B1 B3 B2 B1 B3 B2 Primary Secondary
  • 22. Persistence - Shared Nothing 22 Server 3Server 2Server 1 B1 B3 B2 B1 B3 B2 Primary Secondary
  • 23. Persistence - Shared Nothing 23 Server 3Server 2Server 1 B1 B3 B2 B1 B3 B2 Primary Secondary
  • 24. Persistence - Shared Nothing 24 Server 3Server 2Server 1 B1 B3 B2 B1 B3 B2 Primary Secondary B3 B2 Server 1 waits for others when it starts
  • 25. Persistence - Shared Nothing 25 Server 3Server 2Server 1 B1 B3 B2 B1 B3 B2 Primary Secondary Fetches missed operations on restart
  • 26. Persistence - Operational Logs 26 Create k1->v1 Create k2->v2 Modify k1->v3 Create k4->v4 Modify k1->v5 Create k6->v6 Member 1 Put k6->v6 Oplog2.crf Oplog1.crf Append to operation log
  • 27. Persistence - Operational Logs: Compaction 27 Create k1->v1 Create k2->v2 Modify k1->v3 Create k4->v4 Modify k1->v5 Create k6->v6 Member 1 Put k6->v6 Oplog2.crf Oplog1.crf Append to operation log Copy live data forward
  • 28. • Used for distributed concurrent processing (Map/Reduce, stored procedure) • Highly available • Data oriented • Member oriented Functions 28
  • 30. 30 • Check out: http://geode.incubator.apache.org • Subscribe: user-subscribe@geode.incubator.apache.org • Download: http://geode.incubator.apache.org/releases/ Join the Community!
  • 34. …and horizontal,consistent SCALABILITY! 34 Horizontal scaling for reads, consistent latency and CPU 0. 4.5 9. 13.5 18. 0. 1.25 2.5 3.75 5. 6.25 2 4 6 8 10 Speedup Server Hosts speedup latency (ms) CPU % • Scaled from 256 clients and 2 servers to 1280 clients and 10 servers • Partitionedregion with redundancy and 1K data size