Paris NoSQL User Group - In Memory Data Grids in Action (without transactions chapter)

Transactions chapter will be presented
during another session

In Memory Data Grid in Action
with Oracle Coherence
for Paris NoSQL User Group

Cyrille Le Clerc

Wednesday, May 25, 2011

Speaker

@cyrilleleclerc
blog.xebia.fr

Cyrille Le Clerc
Large Scale

In Memory Data Grid
Open Source
(Apache CXF, ...)

“you build it, you run it”

2

Once upon a time...

3

On the Financial side

- Released Coherence in 2001
Needs within ﬁnancial market :
- Started as a distributed cache
• Very low latency

• Rich queries & transactions

• Scalability
- Released Gigaspaces XAP in 2001
• Data consistency - Started as a data grid

4

Let’s deﬁne an In Memory Data Grid ...

5

Let’s define an In Memory Data Grid

eXtreme Scale

This is an In Memory Data Grid

6


This is Network Attached Memory

7


 Similarities with NoSQL document oriented
Partitioned, distributed Hastable, schema-less, value is not opaque,
scale-out scalability

 Very fast
In memory (persistence coming), business logic inside the data

 Consistent and Available
Transactional, redundant

 Written in Java, data are POJOs
Not necessary

 Clients in Java, Microsoft, etc
8

Use cases for this presentation

9

Train Booking System

trains, stations,
seats, booking and
passengers

10

eCommerce Web Site

warehouse &
customers shopping carts
231

canon-eos: 1
ipod : 1
headphone : 1 311
iphone: 1
...

121
ipad : 1
iphone: 1
264

2

barbie : 1
{
iphone: 1
"name": "Barbie Computer",
cabbage-doll: 1 "stock": 637, 637
"weigth" : 200
} 12

warehouse stocks

11

In Memory Data Grids Key Principles

12

Store Everything in a Mainframe !

3 To of RAM
80 x 5.2 GHtz cores
Much more than $1,000,000

http://ibm.com/

IBM z11

13

Spread on Inexpensive Servers

http://ibm.com/

http://1userverrack.net/

Mainframe Cheap Servers !

14

Partition Data

Partition gamma

Small
servers
Partition beta

MainFrame
Partition alpha

Partition for scalability

15

Duplicate Data

sync synchronization
Master

Partition alpha

Standby Backup

Duplicate data for high availability

16

Data Access Patterns

17


 This is not traditional Java EE coding style !

 Can apply very complex business logic inside the
data
Stored Procedures Style

Change management challenge !

18

Pattern : Targeted Operation

19

Pattern: Targeted Operation

{
"train-id": "tgv-3071-20110512",
"time" : 2011/05/12 12:15, Search Trains
"departure" : "Paris",
"arrival" : "Marseille",
"seats" : 3, Partition gamma
}

Search Trains
Partition beta
“train-id” is indexed

Search Trains

Partition alpha

Book Train Tickets
20

Pattern : Map Reduce Style Operation

21

Pattern: Map Reduce

{
"departure": "Paris",
"arrival": "Marseille",
"time" : 2011/05/12 12:00,
Search Trains
"seats" : 3,
}
Partition gamma

Search Trains
Partition beta

Search Trains

Partition alpha

Distributed “Search Train Ticket”
22

Pattern: Map Reduce

{
"Paris -> Marseille : 12:15",
"Paris -> Marseille : 13:15" Search Trains
}

Partition gamma

{ #NONE# }

Search Trains
Partition beta

{
"Paris -> Lyon -> Marseille : 12:40"
}

Search Trains

Partition alpha

23

Pattern: Map Reduce

Search Trains
Partition gamma

Search Trains
{ Partition beta
"Paris -> Marseille : 12:15",
"Paris -> Lyon -> Marseille : 12:40",
"Paris -> Marseille : 13:15"
}

Search Trains

Partition alpha

24


 This is not traditional Java EE coding style
Change management

 Don’t forget “Map Reduce” = “Distributed Table
Scan”

Use Indexes

25

CAP Theorem & In Memory Data Grids

26

CAP Theorem and In Memory Data Grid

Only 2 of these 3
properties can be
Consistency achieved at any given
moment in time
Brewer’s Conjecture

Availability
Partition
Tolerance

http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf
27

CAP Theorem and In Memory Data Grid

Data Grids Only 2 of these 3
properties can be
Consistency achieved at any given
moment in time
Brewer’s Conjecture

Availability
Partition
Tolerance

http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf
28

Cross Data Center Data Consistency

London
New York
Tokyo

World wide replication
for financial market
29


{
"stock": 147,
"weigth" : 200
}
{

West Coast "stock": 147,
"weigth" : 200
}

East Coast

Warehouse stocks

30


set stock to 146

{
"stock": 147,
"weigth" : 200
}
{

"weigth" : 200
}

East Coast
propagation delay !

31


set stock to 146

{
"stock": 147,
"weigth" : 200
}
{

"weigth" : 200
}

East Coast
set weight 175
reconciliation API needed !

32


set stock to 146

{
"stock": 147,
"weigth" : 200
}
{

"weigth" : 200
}

East Coast
set weight 175
Network partitioning

33

Data Modeling

34

Data Modeling

 Dominant Question Driven Design
Opposite to Relational which is Domain Driven Design

 Constrained Tree Schema
Because RPC matters

 Denormalized
Due to dominant questions and CTS

35

Data Modeling

Seat
Booking Passenger
number
reduction name
price

Train
code
type

TrainStation
TrainStop
code
date
name

Typical relational data model

36

Data Modeling

Partitioning ready
entities tree

e ntity
Root Seat
Booking Passenger
number
reduction name
price

Train
code
type Du R
pli
ca efer
ted en
TrainStation in ce d
ea
TrainStop ch ata
code gri
date dn
name od
e

Find the root entity and denormalize

37

Data Modeling

Remove unused data

Seat
Booking Passenger
number
reduction name
price
booked
Train
code
type

TrainStation
TrainStop
code
date
name

Partitioned

Replicated

38

Data Modeling

Seat
number
price
booked
Train
code
type

TrainStation
TrainStop
code
date
name

Partitioned

Replicated

Data Grid Ready data structure

39

Data Modeling is Hard !

40


Account Account
number number

from to

CashWitdrawal MoneyTransfer CashWitdrawal
date id date
amount date amount
amount

Two root entities for the
same MoneyTransfer !

41


Account Account
number number

CashWitdrawal MoneyTransferIn MoneyTransferOut CashWitdrawal
date id id date
amount date date amount
amount amount

Split MoneyTransfer

42


Account Account
number number

CashWitdrawal MoneyTransferIn MoneyTransferOut CashWitdrawal
date id id date
amount date date amount
amount amount

Split MoneyTransfer

43


Account
number

CashWitdrawal MoneyTransferOut MoneyTransferIn
date id id
amount date date
amount amount

Data Grid Ready data structure

44

Grid Internals

45

Data Serialization

 Used for data transfer and byte oriented storage
Must support evolvable data structure

 Hot topic like Apache Thrift, Apache Avro, Google
Protocol Buffer

46

Data Storage

 Store Java Beans in the grid
No need to unmarshall for inprocess operations

Beware of garbage collector !

 Store byte arrays in the grid
Pay unmarshalling at each read and write

Low-level / byte-oriented APIs to read data
Slightly more garbage collector friendly

47

Communication Protocols

 UDP Multi Cast (Coherence, Gigaspaces)

 TCP/IP (Websphere eXtreme Scale)

48

Topology

 Partitions made of shards : 1 primary + 0..*
backups)

 Dynamic shards location (changes at runtime and
at restart)

 Can use dedicated “directory servers” or embed it
in the “data nodes”

49

JVM and Memory

 Many editors recommend tiny 1.4 Go JVM !
Garbage collector hell

 More than ten JVM per server
Management hell

More and more IMDG support large heaps

50

APIs

51

Raw Java Mapping with Oracle Coherence
public class Train extends AbstractEvolvable implements PortableObject {
enum Type {
HIGH_SPEED, NORMAL
}

/** Key of the Cache */
String code;

/** Indexed */ Seat
String name;
number
Type type;
price
booked
List<Seat> seats = new ArrayList<Seat>(); Train
code
int version; type
List<TrainStop> trainStops = new ArrayList<TrainStop>();
TrainStop
@Override date
public int getImplVersion() {
return 1;
}

@Override
public void readExternal(PofReader pofReader) throws IOException {
this.code = pofReader.readString(0);
this.name = pofReader.readString(1);
this.type = (Type) pofReader.readObject(2);
pofReader.readCollection(3, this.seats);
pofReader.readCollection(4, this.trainStops);
this.version = pofReader.readInt(5);

hand-coded serialization
}

@Override

JUnit is your friend !
public void writeExternal(PofWriter pofWriter) throws IOException {
pofWriter.writeString(0, this.code);
pofWriter.writeString(1, this.name);
pofWriter.writeObject(2, this.type);
pofWriter.writeCollection(3, this.seats, Seat.class);
pofWriter.writeCollection(4, this.trainStops, TrainStop.class);
pofWriter.writeInt(5, this.version);
}
}
52

JPA Style Mapping with Websphere eXtreme
Scale

@Entity(schemaRoot=true)
public class Train { Seat
number
price
@Id
booked
String code; Train
code
@Index type
@Basic
TrainStop
String name;
date

@OneToMany(cascade=CascadeType.ALL)
List<Seat> seats = new ArrayList<Seat>();

@Version
int version;

...
}

sub entities can have
cross relations

53

Map API with Oracle Coherence

NamedCache trainCache = CacheFactory.getCache("train-cache");

/** Save */
void persist(Train train) {
trainCache.put(train.getCode(), train);
}

/** Find by key */
Train findByCode(String code) {
return (Train) trainCache.get(code);
}

/** Find by Query Language */
Train findByTrainName(String name) {
Filter filter = QueryHelper.createFilter("name = :name" ,
Collections.singletonMap("name", name));
Set<Map.Entry<String, Train>> trainEntrySet = trainCache.entrySet(filter);
if (trainEntrySet.isEmpty()) {
return null;
} else {
return trainEntrySet.iterator().next().getValue();
}
}

Map API
54

JPA Style with Websphere eXtreme Scale

/** Save */
void persist(Train train) {
entityManager.persist(train);
}

/** Find by key */
Train findByCode(String code) {
return (Train) entityManager.find(Train.class, code);
}

/** Query Language */
Train findByTrainName(String name) {
Query q = entityManager.createQuery("select t from Train t where t.name=:name");
q.setParameter("name", name);

return (Train) q.getSingleResult();
}

JPA Style Entity Manager

55

Creating Indexes

Map reduce (without index) = Distributed Table Scan !

56

Indexes with Oracle Coherence

class Train {

String name;

Collection<String> getTrainStationsCodes() {
return Collections2.transform(trainStops, ...);
}

...
}

{
NamedCache trainCache = CacheFactory.getCache("train-cache");

trainCache.addIndex(new ReflectionExtractor("getName"), false, null);
trainCache.addIndex(new ReflectionExtractor("getTrainStationsCodes"), false, null);
}

57

Indexes with Websphere eXtreme Scale

@Entity(schemaRoot=true)
class Train {

@Index
@Basic eXtreme Scale
String name;

@Index
Collection<String> getTrainStationsCodes() {
return Collections2.transform(trainStops, ...);
}

...
}

Query query = em.createQuery("select t from Train t where t.name=:name");
query.getPlan();

This is an execution plan

for q2 in Train ObjectMap using INDEX on name = ( ?name)
filter ( q2.c[0] = ?name )
returning new Tuple( q2 )

58

More APIs

Another Java EE versus Spring battle ?
JSR 347 Data Grids vs. Spring Data

Serialization / Object to Tuple Mapping API ?

Unified API ontop of NoSQL stores ?

59

Data Grid <-> Relational Database Interactions

60

Data Grid <-> Relational Database

Data Grids are “In Memory” -> we need to persist data on disk !

61


update / insert / delete

“select directly modified in DB”

62

Data Grid -> Relational Database

backend DB

Highly available write behind queues
+ SQL batched statements
63


Data Grid -> Relational Database

Seat
number
price
booked
Train
code
type

TrainStation
TrainStop
code
date
name

Constrained Tree Schema <-> Relational
Impedance Mismatch

64


DB writes MUST succeed !

Prefer raw SQL rather than reused business logic
Denormalize the database
Remove the foreign keys, use same PKs in DB and data grid
Support unordered SQL statements

Align the database on the Data Grid model !

65

Relational Database -> Data Grid

select * from train
where last_modif > ?

backend DB

Data Grid Originated Scheduled Refresh
(Oracle System Change Number, etc)

66

Relational Database -> Data Grid

backend DB

Database Originated Push
JMS = durable subscription
(Oracle Database Change Notification, etc)
67


 In Memory -> prepare for reloading after
maintenance operations !
Need for “graceful shutdown with disk persistence”

 Prepare consistency checkers

68

Transactions

69

We didn’t have the time to talk about
transaction.

Another session is planned at Paris No SQL
User Group for this.

70

Let’s go live !

71

Data Grids and Operations

 Standard packaging?
Do It Yourself (layout, scripts, etc)

 Limited Management
Do It Yourself (stop/start, detecting data loss, etc)

 Limited debugging tools
Do It Yourself (debugging consoles, troubleshooting agents)

 JVM pandemia
Dozens of JVM to manage !

72

Data Grids and Operations

 Dev / Ops collaboration is required

 Experts only !

73

The right tool for the right job

74

The right tool for the right job

 Incredibly fast ! Even with transactions !

 Scalable
If you solve the data loading issue

 Good at data replication (when it implements it)
Reconciliation api, etc

 Very geeky on both dev and ops side
Not an enterprise grade data store
Requires very skilled people + change management

 “Quite” expensive

75

Questions / Answers

?

76

Paris NoSQL User Group - In Memory Data Grids in Action (without transactions chapter)

More Related Content

Viewers also liked (17)

Similar to Paris NoSQL User Group - In Memory Data Grids in Action (without transactions chapter) (20)

More from Publicis Sapient Engineering (20)

Recently uploaded (20)

Paris NoSQL User Group - In Memory Data Grids in Action (without transactions chapter)