Pivotal's effort on Apache Geode

Apache Geode,
and Pivotal's leadership role
in open sourcing (Gemfire)
Nitin Lamba
(incubating)

Pivotal’s Open Source strategy
What is Apache Geode?
History
Differentiators
Basic Concepts
Resources
Q & A
Agenda
2

4
In 2015, Pivotal granted the components of its Big Data Suite to
open source
6 Million Lines of Code
4 new open source communities

5
May 2015 Sept 2015
Sept 2015Oct 2015

A distributed, memory-based data
management platform for data
oriented apps that need:
• high performance, scalability,
resiliency and continuous
availability
• fast access to critical data sets
• location-aware distributed data
processing
• event-driven data architecture
What is GEODE?
7

• 1000+ systems in production (real customers)
• Cutting edge use cases
Incubating but ROCK solid…
8
<2000 2004 2008 2012 2016
Early drivers
• Data Volumes
• Margins/ transactions
• IT maintenance costs
• Elasticity needs
Real-time needs
• Real-timeresponse
• Time to market needs
• Flexible Data Models
• Persistent+In-memory
Global Data
• Visibility across DC
• Fast Ingest
• Device to enterprise
• Uptime (always on)
Open Source!
• Apache Incubation
• Gemfire > Geode
• Geode M1 release
• 1st Geode Summit
Financial
Services
US DoD
Trade Clearing
Travel Portal
Online
Gambling
Telcos
Manufacturing
Auto Insurance
Payroll processing
Rail systems

…with both SCALE and SPEED, …
9
40K
Transactions
per second
3TB
Data
in-memory
17B
Records
in-memory
120K
Concurrent
users

… and impacting a LOT of people!
10
China Railway
Corporation
Indian
Railways
17%
19%
36%
of the world population

High-level Architecture
11
Powerful app development kit
• APIs: Java & REST
• Adapters: Redis, Lucene*, Spark*, …
Multiple persistence options
• Filesystem, RDBMS or HDFS*
• Sync: read-through, write-through
• Async: write-behind
Durable <K,V> cache/ store
• Data replicated or partitioned
• Redundant storage in-memory/ disk
• Flexible data retention policies
Î
!
Locator
Server
Server
Server
Server
+""""
" 
$
%
%
%
&& &
% % % % % % % %
&&
A Peer-2-Peer in-memory
Distributed System
REST
!
* Experimental and waiting community feedback

• Minimize copying
• Minimize contention points
• Run user code in-process
• Partitioning & parallelism
• Avoid disk seeks
• Automated benchmarks
What makes it go FAST?
12

• Cache
• Region
• Member
• Client Cache
• Persistence
• Functions
Let’s talk about a few BASIC CONCEPTS…
13

• In-memory storage and
management for your data
• Configurable through XML,
Java API or CLI
• Collection of Region
What is a CACHE?
14

• Distributed java.util.Map on
steroids (Key/Value)
• Consistent API regardless of
where or how data is stored
• Observable (reactive)
• Highly available, redundant on
cache Member (s).
What is a REGION?
15

• Local, Replicated or Partitioned
• In-memory or persistent
• Redundant
• LRU
• Overflow
Region: Types & Options
16
LOCAL
LOCAL_HEAP_LRU
LOCAL_OVERFLOW
LOCAL_PERSISTENT
LOCAL_PERSISTENT_OVERFLOW
PARTITION
PARTITION_HEAP_LRU
PARTITION_OVERFLOW
PARTITION_PERSISTENT
PARTITION_PERSISTENT_OVERFLOW
PARTITION_PROXY
PARTITION_PROXY_REDUNDANT
PARTITION_REDUNDANT
PARTITION_REDUNDANT_HEAP_LRU
PARTITION_REDUNDANT_OVERFLOW
PARTITION_REDUNDANT_PERSISTENT
PARTITION_REDUNDANT_PERSISTENT_OVERFLOW
REPLICATE
REPLICATE_HEAP_LRU
REPLICATE_OVERFLOW
REPLICATE_PERSISTENT
REPLICATE_PERSISTENT_OVERFLOW
REPLICATE_PROXY

• Durability
• WAL for efficient writing
• Consistent recovery
• Compaction
Persistent Regions
17
Server 1 Server N

• A process that has a connection to
the system
• A process that has created a cache
• Embeddable within your
application
What is a MEMBER?
18
Client
Locator
Server

• A process connected to the
Geode server(s)
• Can have a local copy of the data
• Run OQL queries on local data
• Can be notified about events on
the servers
What is a CLIENT CACHE?
19

Persistence - Shared Nothing
20
Server 3Server 2Server 1

21
B1
B3
B2
B1
B3
B2
Primary
Secondary

22
B1
B3
B2
B1
B3
B2
Primary
Secondary

23
B1
B3
B2
B1
B3
B2
Primary
Secondary

24
B1
B3
B2
B1
B3
B2
Primary
Secondary
B3
B2
Server 1 waits for others when it starts

25
B1
B3
B2
B1
B3
B2
Primary
Secondary
Fetches missed operations on restart

Persistence - Operational Logs
26
Create
k1->v1
Create
k2->v2
Modify
k1->v3
Create
k4->v4
Modify
k1->v5
Create
k6->v6
Member 1
Put k6->v6
Oplog2.crf
Oplog1.crf
Append to
operation log

Persistence - Operational Logs: Compaction
27
Create
k1->v1
Create
k2->v2
Modify
k1->v3
Create
k4->v4
Modify
k1->v5
Create
k6->v6
Member 1
Put k6->v6
Oplog2.crf
Oplog1.crf
Append to
operation log
Copy live
data forward

• Used for distributed concurrent
processing
(Map/Reduce, stored procedure)
• Highly available
• Data oriented
• Member oriented
Functions
28

30
• Check out: http://geode.incubator.apache.org
• Subscribe: user-subscribe@geode.incubator.apache.org
• Download: http://geode.incubator.apache.org/releases/
Join the Community!

Built for PERFORMANCE…
33
0
200,000
400,000
600,000
800,000
1,000,000
AReads
AUpdates
BReads
BUpdates
CReads
DInserts
DReads
FReads
FUpdates
Operationspersecond
YCSB Workloads
Cassandra Geo

…and horizontal,consistent SCALABILITY!
34
Horizontal scaling for reads, consistent latency and CPU
0.
4.5
9.
13.5
18.
0.
1.25
2.5
3.75
5.
6.25
2 4 6 8 10
Speedup
Server Hosts
speedup latency (ms) CPU %
• Scaled from 256 clients and 2 servers to 1280 clients and 10 servers
• Partitionedregion with redundancy and 1K data size

Pivotal's effort on Apache Geode

More Related Content

What's hot (20)

Similar to Pivotal's effort on Apache Geode (20)

More from Apache Apex (20)

Recently uploaded (20)

Pivotal's effort on Apache Geode