Getting to know the Grid - Goto Aarhus 2013

Getting to know the Grid
Syed M Shaaf
Red Hat

Goto; conference Aarhus 2013 | Syed M Shaaf
Quick introduction
Solutions Architect at Redhat Nordics
Red Hat JBoss middleware
@sshaaf @RedHatNordics
http://www.redhat.com

Web/appservers
DB/Storage Integrationservers
Mgmt/Monitor
One Scenario

Web servers
Grid servers
Mgmt/Monitor
Another Scenario
Data Replication and Cache

What is?
● Schema-less key/value
store
● Compatible with
applications written in any
language, using any
framework
● Easy access through APIs
● Consistent hash-based
distribution
● Self-healing
● No single point of failure
● Durability (persistence)
● Memory management
(eviction, expiration)
● XA transactions

JBoss Data Grid and JSR
● JSR-107: Temporary caching API
● JSR-347: Data grids
● Development led by Red Hat
● JSR-346: CDI1.1
● Programming model for data grids
● JSR-317: JPA2
● Data grids form caching API for database via JPA2

7
And then its a matter of
scaling..

Clustering subsystems
• JGROUPS - toolkit for the underlying communication
between nodes . Configured with 2 stacks for communication
UDP (default) and TCP (if the environment is not
multicasting)
• INFINISPAN - data caching and object replication and comes
with 3 preconfigured caches:
• cluster - Replication of objects in a HA cluster
• web - Session replication
• sfsb - Replication of stateful session bean
• hibernate - 2nd level entity caching for JPA/Hibernate
• MODCLUSTER- software LB spreads requests among two or
more nodes

Clustering architecture

Cluster architecture
JGroups
Infinispan
HTTP Session Clustering
EAP Instance
JGroups
Infinispan
HTTP Session Clustering
EAP Instance
Replication

mode=replication
All the data is stored on all cluster nodes
Writes are sent to all nodes
– Every node updates its local cache
Reads are always local
New nodes acquire the initial state from the oldest node
Clients can access any node for reading or writing
Scalability is limited by cluster size and data size
10 nodes with 100MB state each: every node needs 1GB

mode=replication; action=rw
mod_cluster
K V
K1
K2
K3
K V
K1
K2
K3
K V
K1
K2
K3
Replication
rw

Mode=distribution
Data is only stored on N cluster nodes (say N=2)
A consistent hash on a key “id” determines the 2
servers for “id”
– Example: cluster is {A,B,C,D,E,F}
– Hash(“id”) = 8; 8 MOD 6 = 2
– --> Primary owner = B, backup owner = C
Crash of B, new view is {A,C,D,E,F}
– --> Primary owner = D, backup owner = E
– --> C needs to transfer “id” to D and E and remove it
locally
Knowing the key, we always find the right server(s)

mode=distribution; action=w
mod_cluster
K V
K1
K V
K1
K2
K V
K2
Replication

Cross Site replication
Cache B
Cache Manager
Cache A
Bergen
[RELAY]
JGroups
Cache B
Cache Manager
Cache A
Trondheim
Cache B
Cache Manager
Cache A
Oslo
[RELAY]
JGroups
[RELAY]
JGroups

Client and server
Multiple access protocols
Protocol Format Client type Smart? Load balance
and failover
REST text any no external
Memcached text any no pre-defined
HotRod binary Java, C#,
Python
yes auto/dynamic

Advanced functionality
Eviction, expiration, and passivation
● Expiration – defined per entry or cache
● Eviction – FIFO, LRU, unordered, LIRS, none
● Passivation
Step Action Keys in memory Keys on disk
1 Insert K1 K1 n/a
2 Insert K2 K1, K2 n/a
3 Eviction thread - K1 K2 K1
4 Read K1 K1, K2 n/a
5 Eviction thread K2 K1 K2
6 Remove K2 K1 n/a

Why use consistent hashing?
● Cost-effective, speed
benefits
● Deterministic location of
keys
● Sufficient copies for fault
tolerance and durability
but without an
overabundance of copies
Key 372
Value “p”
Key 500
Key 0
Node A
Node C
Node B

Consistent hashing
Hash ring
● Cost-effective, speed benefits
● Deterministic location of keys
● Sufficient copies for fault tolerance
and durability without an
overabundance of copies
Node A
● Stores values of keys 815-1000-330
● Wraps around
Value “m”
● Stored in Key 743
● Based on key value, located on Node
C
Value “p”
● Stored in Key 372
● Based on key value, located on Node
B
Key 743
Value “m”
Key 372
Value “p”
Key 500
Key 0
Node A
Key range [815,330]
Node C
Key range [643,814]
Node B
Key range [331,642]

Consistent hashing
● Event: Node B goes
offline
● Node A
● Now stores keys
815-642
● Node C - unchanged
● Value “m” - unchanged
● Value “p”
● Stored in key 335
● Now located on
Node A
Key 500
Key 843
Value “m”
Key 335
Value “p”
Key 0Key 1000
Node A
Key range [815,642]
Node B
Key range [331,642]
Node C
Key range [643,814]

Consistent hashing – Virtual nodes
● Addresses
irregularities in node
distribution
● Location of entry
determined
algorithmically
● Allocates multiple
blocks throughout the
hash space when a
node joins or leaves
grid
Key 500
Key 843
Value “m”
Key 0
Key 335
Value “p”
Key 1000

JBoss Data Grid conceptual architecture
Client / server
Client
Server
Persistent
store
User
app
Cache
API
L1
cache
Cache
manager
Cache
Cache
Cache
Cache
Cache
loader/store
Cache
loader/store
Persistent
store

Conceptual architecture
Cache API and L1 cache
User application
● End-user interface (i.e. web
application, Java server application)
Cache API
Uses memcached, Hot Rod, or REST
APIs
L1 near cache
● Stores remote cache entries after
they are initially accessed
● For fast retrieval and to prevent
unnecessary remote fetch operations
Client
User
app
Cache
API
L1
cache

Cache and cache manager
Cache manager
● Primary mechanism to retrieve a
cache instance
Cache
● Houses cache instances
Flexible setup
● One cache manager per process
● Multiple caches per cache
manager
● One interface per cache
Cache
manager
Cache
Cache
Cache
Cache

Cache and cache manager
Cache
manager
Cache
Cache
Cache
Cache
Cache configuration
● Locking policy
● Transactions
● Eviction policy
● Expiration policy
● Persistence mechanism
● Backups
● L1 cache policy
Cache manager configuration
● Name / Alias / JNDI
● Start-up policy
● Transport policies
● Caches

Cache store, cache loader, and persistent store
Cache loader
● Ready-only interface – locate
and retrieve data
Cache store
● Cache loader with write
capabilities
Persistent store
● Permanent store for cache
instances and entries (i.e.
relational database)
Persistent
store
Cache
loader/store
Cache
loader/store
Persistent
store

The cache store
● Write-behind or write-
through behavior
● A cache has one or more
cache stores
● Cache stores can be
chained
● Can be loaded or purged on
start
● Open and supported API for
custom stores
● File, JDBC, remote
Persistent
store
Cache
loader/store
Cache
loader/store
Persistent
store

Use case - Local cache
Boost application performance
A more sophisticated HashMap
● Persistence
● Eviction, expiration
● Eliminate OOM
● Warm-start, preload
● Transaction capable (JTA)
● Monitor-able (JMX)
● Events and notifications
● Plugs into many frameworks to
boost performance
Application
Cache BCache A
Database
Ideal for:
● Single processes
● Data unique to a process
● Unshared data

Use case – Data grid
Achieve massive elastic big data scale
● Distributed, horizontally
scalable, unlimited storage
● Move processing to data with
map and reduce
● Low-latency, fast performance
● Eliminate single point of failure
● Built on Red Hat-led JSR-347
(data grids) standards
● Multiple access protocols
● Compatible with applications
written in any language, any
framework
Standalone server C
Database
optional
Application A
CacheCache
Application B
CacheCache
CacheStandalone server B CacheStandalone server A
CacheCacheCacheCache
CacheCache

Use case - Replicated cache
Ultimate failover protection
● Instant reads, linear
performance scalability
● Network overhead scales
linearly
● Limited to a single JVM heap
size
● Replicate the same key/value,
updates across the cluster
Application A’
Application A
Cache BCache A
Database
Application B
Cache BCache A
Ideal for:
● Small, fixed datasets
● Scenarios requiring extremely
high fault tolerance

Use case – Data grid
Achieve massive elastic big data scale
Ideal for:
● Massive distributed
datasets like those from
global, decentralized
locations
● Elastic datasets that
experience large
fluctuations, periodicity, or
unpredictability
● Transferring transaction
loads away from local
cache and traditional
databases
Standalone server C
Database
optional
Application A
CacheCache
Application B
CacheCache
CacheStandalone server B CacheStandalone server A
CacheCacheCacheCache
CacheCache

36
JBoss Data Grid:
Deployment and use patterns

Deployment
Library mode
● “Bring your own” container
● Within one JVM:
● Multiple caches
● One node / cache
● Multiple caches / application
● ‘Cache hit’ is in memory
● Transactions, monitoring, events,
and notifications
JVM
Cache
Cache
Cache
User
application
User
application

Deployment
Client / Server stand-alone mode
● “Remote” clients
● Within one service JVM
● Multiple caches
● One node / cache
● Multiple caches / application
● Cache hit, not in local
memory
● Compatibility - language
agnostic
● Separate app and storage life
cycles
JVM
Data Grid
Cache
Data Grid
CacheCache
User
application
User
application

Usage patterns
Side cache
● Application manages cache
Database
Application
Cache

Usage patterns
Inline cache - Application speaks only to cache
1) App requests data (K1)
2) Cache loader retrieves
from persistent store (K1)
Application
Persistent
store
Cache
Loader
K1
1) App writes data (K2)
2) Cache writes to
persistent store (K2)
K1
Application
Persistent
store
Cache
Store
K2
K2
K2

Searching/Indexing
Cache B
Cache Manager
Cache A
App A. Hibernate Search
App B.
Get Indexed data
Server

Map/Reduce
1. MAP
K V
K1
K2
K3
K V
K1
K2
K3
K V
K1
K2
K3
M
M
M
2. Reduce
R
R
R

Web servers
Grid servers
Mgmt/Monitor
One Scenario
Data Replication and Cache

References
●
Http://www.redhat.com
●
Http://access.redhat.com
● Http://www.openshift.com
●
Http://www.jboss.org/infinispan
●
Http://www.jboss.org/jgroups

Getting to know the Grid - Goto Aarhus 2013

More Related Content

What's hot (20)

Similar to Getting to know the Grid - Goto Aarhus 2013 (20)

More from Syed Shaaf (12)

Recently uploaded (20)

Getting to know the Grid - Goto Aarhus 2013