Data has a better idea the in-memory data grid

The In-Memory Data Grid
Photo by Franki Chamaki on Unsplash

About me ...
- Passionate software engineer
- Focused mainly on JVM
- Interested in all software development phases
- Having erethic opinions
- Non politically correct, but just correct …
- Personal belief: “living in a distributed and reactive full of actors system”
Posting on:

Agenda
1. What is In-Memory Data Grid (IMDG) ?
2. Hazelcast IMDG
3. Cluster Discovery
4. Partitioning and Replication
5. Data Structure Overview
6. User-Code Deployment & Hazelcast-Spring
7. Demo time!

What is an In-Memory Data Grid (IMDG) ?
A Data Grid is a system of multiple servers that work together to manage
information and related operations in a distributed environment.

The servers from the grid can be located in the same location or distributed
across multiple data centers.

The servers from the grid can be located in the same location or distributed across
multiple data centers.
An In-Memory Data Grid is a grid that stores data entirely into RAM.

Why to use an In-Memory Data Grid?
Performance
● Access data 1000x faster
than a database
● Low latency for batch and
stream processing

Performance Data structure/Handling
than a database
stream processing
● Non-relational key-value
● ACID compliance

Performance Data structure/Handling Operations
than a database
stream processing
● Non-relational key-value
● ACID compliance
● Scalability
● Redundancy for HA

When to use an In-Memory Data Grid?
Data Cache
● Eliminates data store
bottlenecks
● Eliminates slow network
connections
● Long-running blocking
calculations

Data Cache Data Service Fabric
bottlenecks
connections
calculations
● Real-time integration
● Compute grid
● Message broker

Data Cache Data Service Fabric Examples
bottlenecks
connections
calculations
● Real-time integration
● Compute grid
● Message broker
● Analytics (Risk,
Fraud-detection)
● Trading Systems (FX
Trading, Stock Exchange)
● eCommerce
● Online Gaming

Basic operations of an In-Memory Data Grid
Cluster
● Distributed data
● Highly scalable
● Fault tolerance

Cluster Discovery
● Highly scalable
● Fault tolerance
● Form
● Join
● Find

Cluster Discovery Data Distribution
● Highly scalable
● Fault tolerance
● Form
● Join
● Find
● Replication/Mirroring
● Partitioning/Sharding

Replication and Partitioning
Replication - all the data is replicated (synchronously or asynchronously) to
every node in the cluster.

examples: a cluster of a relational database (MySQL, Oracle, etc.) leveraging
the master/leader-slave/follower model

examples: a cluster of a relational database (MySQL, Oracle, etc.) leveraging
the master/leader-slave/follower model
- Synchronous replication: writes need to be confirmed by a configurable
number of slaves/followers before the master/leader reports success (ACKs).
- Asynchronous replication: the master/leader reports success immediately
after a write was committed to its own disk/memory; followers apply the
changes in their own pace.

Partitioning/Sharding - is the most scalable distributed mode and relies on
keeping data in multiple disjoint distributed collections on different nodes,
leveraging 3 partitioning strategies: range, hash and custom partitioning.

Partitioning/Sharding - is the most scalable distributed mode and relies on keeping
data in multiple disjoint distributed collections on different nodes, leveraging 3
partitioning strategies: range, hash and custom partitioning.
examples: most NoSQL databases (Cassandra, MongoDB, etc.), Kafka uses
partitions

Partitioning/Sharding - is the most scalable distributed mode and relies on keeping data in multiple
disjoint distributed collections on different nodes, leveraging 3 partitioning strategies: range, hash and
custom partitioning.
examples: most NoSQL databases (Cassandra, MongoDB, etc.), Kafka uses partitions
- Range partitioning: takes into account the natural order of keys to split the dataset in the required
number of partitions; ex: MySQL partitions.
- Hash partitioning: calculates a hash over the each item key and then produces the modulo of this
hash to determine the new partition; ex: Cassandra consistent hashing algorithm.
- Custom partitioning: exploits locally or uniqueness properties of the data to calculate the
appropriate partition to store the data to; ex: pre-hashed data like git commits or location
specific data like all records from Europe

Replication vs Partitioning
Cons
Cons
Pros
Pros
Pros
Pros
Workload scalability
High availability
Failure recovery

Replication Cons
Cons
Pros
Pros
Pros
Pros
High availability
Failure recovery
Cons
Replication negative
performance impact
Data inconsistency
Memory scalability
issues

Replication Cons
Cons
Pros
Pros
Pros
Pros
High availability
Failure recovery
Cons
performance impact
Data inconsistency
Memory scalability
issues
Pros
Failure recovery
Memory scalability
Good synchronization
performance

Replication Cons
Cons
Pros
Pros
Pros
Pros
High availability
Failure recovery
Cons
performance impact
Data inconsistency
Memory scalability
issues
Pros
Failure recovery
Memory scalability
Good synchronization
performance
Cons
Migration negative
performance impact
Large memory
requirements

Deployment options
Embedded IMDG Client-Server

Hazelcast IMDG -
Characteristics

Why to choose Hazelcast IMDG?
Market Leader
● Hazelcast IMDG is
market leader
among In-Memory
Data Grid
solutions

Market Leader Rich API
market leader
among In-Memory
Data Grid
solutions
● APIs in various
programming
languages: Java,
C#.NET, Python,
etc.
● Powerful features
● Huge user base -
open source project

Market Leader Rich API Ease of use
market leader
among In-Memory
Data Grid
solutions
● APIs in various
programming
languages: Java,
C#.NET, Python,
etc.
open source project
● Simple to use key-value
data store
● Standard data
structures: Map, List,
Queue, etc.
● Clients for many
programming
languages
● Redundancy/fail-over/sc
aling built-in

Market Leader Rich API Ease of use Distributed data store
& computation system
market leader
among In-Memory
Data Grid
solutions
● APIs in various
programming
languages: Java,
C#.NET, Python,
etc.
open source project
● Simple to use key-value
data store
● Standard data
structures: Map, List,
Queue, etc.
● Clients for many
programming
languages
● Redundancy/fail-over/
scaling built-in
store
● Distributed
computation near
stored data

Business scenario and HLA - overview
Business scenario:
● We will use Hazelcast IMDG for
developing a Foreign Exchange
Quotation Management System.
The system is consisting of:
● Two Spring Boot microservices:
market-client and trader-cli which are
basically Hazelcast clients and are
communicating with the grid via APIs.
● One Spring Boot microservice called
processing-unit which is basically a
Hazelcast server member that will join the
cluster when started.

Hazelcast features used
Deployment model:
● Client-Server
Cluster discovery mechanism:
● TCP/IP unicast discovery
Data structures used:
● Replicated Map
● Partitioned Map

Client-Server Deployment Model
Hazelcast Client Hazelcast Cluster Member
For creating a Hazelcast Client Java application we
must add the following dependencies:
● Prior to Hazelcast 4.x:
○ com.hazelcast:hazelcast:3.x
○ com.hazelcast:hazelcast-client:3.x
● For projects which are using Hazelcast 4.x:
○ com.hazelcast:hazelcast:4.x
For creating a Hazelcast Cluster Member Java
application we must add the following dependency:
○ com.hazelcast:hazelcast:${version}

Hazelcast Cluster Discovery
There are multiple ways to establish a discovery mechanism inside our Hazelcast
cluster:
● TCP/IP multicast
● TCP/IP unicast
● Discovery plugins or cloud: Eureka, ZooKeeper, K8s, OpenShift, Pivotal
Cloud Foundry (PCF), Google Cloud Platform (GCP), AWS, Azure
● Custom discovery mechanism via Discovery SPI

Hazelcast Cluster Discovery
In our cluster members we use TCP/IP unicast discovery.
In com.freesoft.fx.trading.processingunit.infrastructure.imdg.HazelcastConfiguration.java:

Hazelcast Replicated Map
In the processing-unit Hazelcast Cluster Member Java application we are using
a replicated map data structure for storing the quote prices published by the
market-client.
The map must have a name, which in our case is “QUOTES_MAP” and is stored
in binary format in each cluster member instance.

Hazelcast Replicated Map
The necessary configuration for using a Replicated Map can be found in
com.freesoft.fx.trading.processingunit.infrastructure.imdg.HazelcastConfiguration.java file.

Hazelcast Partitioned Map
In the processing-unit Hazelcast Cluster Member Java application we are using
a partitioned map data structure for storing all commands (Buy and Sell)
published by each trader (trader-cli microservice).
Hazelcast uses a hash partitioning for distributing the data across all cluster
members.

Hazelcast Partitioned Map
The necessary configuration for using a Partitioned Map can be found in
com.freesoft.fx.trading.processingunit.infrastructure.imdg.HazelcastConfiguration.java file.

User code deployment
& Hazelcast-Spring

● Not enabled by default
● Allows us to load client classes inside cluster members
● There are necessary configurations that must be done in both
the client and the cluster member

Client configuration

Client configuration Cluster member configuration

Hazelcast-Spring
● com.hazelcast:hazelcast-spring:${version}

Hazelcast-Spring
● Dependency Inversion Principle

Hazelcast-Spring
● Dependency Inversion Principle
● @SpringAware

User-code deployment & Hazelcast-Spring

Thank you !
● Don’t forget to follow me on @dinabogdan03
● Don’t forget to read my articles on @bogdan.dina03
● Join Bucharest Apache Kafka meetup group !!!

Data has a better idea the in-memory data grid

More Related Content

What's hot (20)

Similar to Data has a better idea the in-memory data grid (20)

Recently uploaded (20)

Data has a better idea the in-memory data grid