SlideShare a Scribd company logo
© 2017 GridGain Systems, Inc.
In-Memory Performance
Durability of Disk
© 2017 GridGain Systems, Inc.
In-Memory Computing Essentials
for Java Developers
Denis Magda
Ignite PMC Chair
GridGain Director of Product Management
© 2017 GridGain Systems, Inc.
• Apache Ignite Overview
• Clustering and Deployment
• Distributed Storage
• Distributed SQL
• Distributed Computations
• Machine Learning
• Memory Architecture & Persistence
Agenda
© 2017 GridGain Systems, Inc.
Apache Ignite In-Memory Computing Platform
Memory-Centric Storage
Ignite Native Persistence
(Flash, SSD, Intel 3D XPoint)
Third-Party Persistence
(RDBMS, HDFS, NoSQL)
SQL Transactions Compute Services MLStreamingKey/Value
IoTFinancial
Services
Pharma &
Healthcare
E-CommerceTravel &
Logistics
Telco
© 2017 GridGain Systems, Inc.
Clustering and Deployment
© 2017 GridGain Systems, Inc.
Clustering
• Server Nodes
• Act as containers for data and computations
• Generally started as standalone processes
• Client Nodes
• Provide a cluster entry point to run operations
• Embedded in applications code
© 2017 GridGain Systems, Inc.
Deployment
• Nodes are logical entities
• Runs in a JVM process
• Many nodes in a single JVM process
• On-Premise and Cloud
• Physical server or VM
• AWS, Azure, Google Compute Engine
• Kubernetes, Mesos, YARN
© 2017 GridGain Systems, Inc.
Distributed Storage
© 2017 GridGain Systems, Inc.
Distributed Storage
JCache Transactions Compute SQL
RDBMS
NoSQL
HDFS
Server Node
Distributed Key-Value Store
Dynamic
Scaling
Distributed
partitioned
hash map
ACID TransactionJCache & SQL
Server Node Server Node
3rd party storage caching
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
© 2017 GridGain Systems, Inc.
Where Entry Goes?
Ignite Node 1 Ignite Node 2
put (key, value)
? ?
© 2017 GridGain Systems, Inc.
Key to Node Mapping
Key Partition
Server Node
ON-DISK
© 2017 GridGain Systems, Inc.
Caches and Partitions
K1, V1
K2, V2
K3, V3
K4, V4
Partition 1
K5, V5
K6, V6
K7,V7
K8, V8 K9, V9
Partition 2
Cache
© 2017 GridGain Systems, Inc.
Partitions Distribution
Ignite Node 1 Ignite Node 2
0 2 4 6 8
10 12 14
1 3 5 7 9
11 13 15
© 2017 GridGain Systems, Inc.
Where Entry Goes?
Ignite Node 1 Ignite Node 2
put (key, value)
0 2 4 1 3 5
? ?
© 2017 GridGain Systems, Inc.
Where Entry Goes?
Ignite Node 1 Ignite Node 2
put (key, value)
0 2 4 1 3 5
© 2017 GridGain Systems, Inc.
Backup Copies
Ignite Node Ignite Node
Ignite Node Ignite Node
0 1
2 3
© 2017 GridGain Systems, Inc.
Backup Copies
Ignite Node Ignite Node
Ignite Node Ignite Node
0 1
2 3
0
1
2
3
© 2017 GridGain Systems, Inc.
Distributed SQL
© 2017 GridGain Systems, Inc.
Distributed SQL
JDBC ODBC SQL API
Java .NET C++ BI
SELECT, UPDATE,
INSERT, MERGE,
DELETE, CREATE
and ALTER
DDL, DML Support
Cross-platform
Compatibility
Indexes in
RAM or Disk
Dynamic
Scaling
Server Node Server NodeServer Node
Apache Ignite Cluster
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
Tools
© 2017 GridGain Systems, Inc.
Connectivity
• JDBC
• ODBC
• REST
• Java, .NET and C++ APIs
// Register JDBC driver.
Class.forName("org.apache.ignite.IgniteJdbcThinDriver");
// Open the JDBC connection.
Connection conn = DriverManager.getConnection("jdbc:ignite:thin://192.168.0.50");
./sqlline.sh --color=true --verbose=true -u jdbc:ignite:thin://127.0.0.1/
© 2017 GridGain Systems, Inc.
Data Definition Language
• CREATE/DROP TABLE
• CREATE/DROP INDEX
• ALTER TABLE
• Changes Durability
• Ignite Native Persistence
CREATE TABLE `city` (
`ID` INT(11),
`Name` CHAR(35),
`CountryCode` CHAR(3),
`District` CHAR(20),
`Population` INT(11),
PRIMARY KEY (`ID`, `CountryCode`)
) WITH "template=partitioned, backups=1, affinityKey=CountryCode";
© 2017 GridGain Systems, Inc.
Data Manipulation Language
• ANSI-99 specification
• Fault-tolerant and consistent
• INSERT, UPDATE, DELETE
• SELECT
• JOINs
• Subqueries
SELECT country.name, city.name, MAX(city.population) as max_pop
FROM country JOIN city ON city.countrycode = country.code
WHERE country.code IN ('USA','RUS','CHN')
GROUP BY country.name, city.name ORDER BY max_pop DESC LIMIT 3;
© 2017 GridGain Systems, Inc.
Affinity Collocation
Country
Languag
e
City
Server Node
ON-DISK
Server Node
ON-DISK
key (country = 5) 10
Partition
key (cityId = 10, countryId = 5)
10
Partition
key (cityId = 11, countryId = 9) 12
Partition
© 2017 GridGain Systems, Inc.
Collocated Joins
1. Initial Query
2. Query execution over local data
3. Reduce multiple results in one
Ignite Node
Canada
Toronto
Ottawa
Montreal
Calgary
Ignite Node
India
Mumbai
New Delhi
1 SELECT ct.name, c.name
FROM Country as ct
JOIN City as c ON ct.id = c.countryId
WHERE ct.name = “Canada”;
2
23
© 2017 GridGain Systems, Inc.
Non-Collocated Joins
1. Initial Query
2. Query execution (local + remote data)
3. Potential data movement
4. Reduce multiple results in one
Ignite Node
Canad
a
Toronto
Calgary
1 SELECT ct.name, c.name
FROM Country as ct
JOIN City as c ON ct.id = c.countryId
WHERE ct.name = “Canada”;
2
24 Ignite Node
India
Montreal
Ottawa
3
Montreal
Ottawa
Mumbai
New Delhi
© 2017 GridGain Systems, Inc.
Distributed Computations
© 2017 GridGain Systems, Inc.
Compute Grid
DURABLE MEMORY
DURABLE MEMORY
Ignite Cluster
C1
R1
C2
R2
C = C1 + C2
R = R1 + R2
C = Compute
R = Result
in T/2 time
Automatic Failover
Load Balancing
Zero Deployment
© 2017 GridGain Systems, Inc.
1. Initial Request
2. Fetch data from remote
nodes
3. Process entire data-set
3
1
Data 1
2
2 Data 2
Client-Server Processing Co-located Processing
Server Node
ON-DISK
Server Node
ON-DISK
1. Initial Request
2. Co-located processing with
data
3. Reduce multiple results in
one
2
2
1Client Node
Server Node
ON-DISK
Server Node
ON-DISK
Client Node
3
© 2017 GridGain Systems, Inc.
Machine Learning
© 2017 GridGain Systems, Inc.
Genetic Algorithm Grid
DURABLE MEMORY
DURABLE MEMORY
Ignite Cluster
F2, C2, M2
F = F1 + F2
C = C1 + C2
Collocated
Computation
Biological Evolution
Simulation
Chromosome and Genes Cluster
M = M1 + M2
F1, C1, M1
F = Fitness Calculation
C = Crossover
M = Mutation
© 2017 GridGain Systems, Inc.
Machine Learning Grid
K-Means Regressions Decision Trees
R C++ Python Java
Server Node Server NodeServer Node
Distributed Core Algebra
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
Scala REST
Random Forest
Distributed Algorithms
Dense and Sparse
Algebra
Large Scale
Parallelization
Multi-Language
Support
Dense and Sparse
Algebra
No ETL
© 2017 GridGain Systems, Inc.
Memory Architecture & Persistence
© 2017 GridGain Systems, Inc.
Durable Memory
Off-heap Removes
noticeable GC
pauses
Automatic
Defragmentation
Stores
Superset of
Data
Predictable memory
consumption
Fully Transactional
(Write-Ahead Log)
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
Server Node Server Node Server Node
Ignite Cluster
Instantaneous
Restarts
© 2017 GridGain Systems, Inc.
© 2017 GridGain Systems, Inc.
Regions and Segments
• Memory split into regions
• Regions split into segments
• Segments include pages
© 2017 GridGain Systems, Inc.
B+Tree
• Self-balancing tree
• Memory & Disk
• Sorted Index
• Secondary Indexes
• Hash Index
• Primary Keys
• Hash code based sorting
© 2017 GridGain Systems, Inc.
Free Lists
• Tracks pages of ~ equal free space
• 25% free
• 75% free
• Essential for updates
• Gives page with min size needed
• Reduces fragmentation
• Lowers pages compaction activity
© 2017 GridGain Systems, Inc.
Ignite Native Persistence
1. Update
RAM
2. Persist
Write-Ahead Log
Partition File 1
3. Ack
4. Checkpointing
Partition File N
Server Node
© 2017 GridGain Systems, Inc.
Any Questions?
Thank you for joining us. Follow the conversation.
http://ignite.apache.org
#apacheignite
#denismagda

More Related Content

PDF
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
PPTX
Distributed Database DevOps Dilemmas? Kubernetes to the Rescue
PPTX
Apache Ignite - Distributed SQL Database Capabilities
PPTX
Microservices Architectures With Apache Ignite
PPTX
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT
PPTX
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
PPTX
In-Memory Computing Essentials for Software Engineers
PDF
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
Distributed Database DevOps Dilemmas? Kubernetes to the Rescue
Apache Ignite - Distributed SQL Database Capabilities
Microservices Architectures With Apache Ignite
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
In-Memory Computing Essentials for Software Engineers
Apache Spark and Apache Ignite: Where Fast Data Meets IoT

What's hot (20)

PDF
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
PPTX
On Cloud Nine: How to be happy migrating your in-memory computing platform to...
PDF
Troubleshooting Apache® Ignite™
PPTX
Continuous Machine and Deep Learning with Apache Ignite
PDF
Apache Ignite - Distributed Database Orchestration
PPTX
IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...
PPTX
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
PDF
RedisConf18 - Remote Monitoring & Controlling Scienific Instruments
PPTX
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
PPTX
How we broke Apache Ignite by adding persistence
PDF
KubeCon 2017 - Kubernetes SIG Scheduling and Resource Management Working Grou...
PDF
PostgreSQL continuous backup and PITR with Barman
 
PDF
Cncf kanister.pptx
PPTX
RedisConf17 - Turbo-charge your apps with Amazon Elasticache for Redis
PDF
No Time to Waste: Migrate from Oracle to EDB Postgres in Minutes
 
PPTX
Next Generation Scheduling for YARN and K8s: For Hybrid Cloud/On-prem Environ...
PPTX
PostgreSQL 12: What is coming up?, Enterprise Postgres Day
 
PDF
Public Sector Virtual Town Hall
 
PDF
Elastic Cloud Enterprise @ Cisco
PDF
RedisConf18 - Redis on Google Cloud Platform
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
On Cloud Nine: How to be happy migrating your in-memory computing platform to...
Troubleshooting Apache® Ignite™
Continuous Machine and Deep Learning with Apache Ignite
Apache Ignite - Distributed Database Orchestration
IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
RedisConf18 - Remote Monitoring & Controlling Scienific Instruments
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
How we broke Apache Ignite by adding persistence
KubeCon 2017 - Kubernetes SIG Scheduling and Resource Management Working Grou...
PostgreSQL continuous backup and PITR with Barman
 
Cncf kanister.pptx
RedisConf17 - Turbo-charge your apps with Amazon Elasticache for Redis
No Time to Waste: Migrate from Oracle to EDB Postgres in Minutes
 
Next Generation Scheduling for YARN and K8s: For Hybrid Cloud/On-prem Environ...
PostgreSQL 12: What is coming up?, Enterprise Postgres Day
 
Public Sector Virtual Town Hall
 
Elastic Cloud Enterprise @ Cisco
RedisConf18 - Redis on Google Cloud Platform
Ad

Viewers also liked (20)

PPTX
Walk through an enterprise Linux migration
PDF
Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)
PDF
[若渴計畫] Challenges and Solutions of Window Remote Shellcode
PPTX
Graduating To Go - A Jumpstart into the Go Programming Language
PPTX
Docker Networking
PDF
Scale Up with Lock-Free Algorithms @ JavaOne
PDF
Advanced memory allocation
PPT
DevRomagna / Golang Intro
PPTX
Communication hardware
PDF
numPYNQ @ NGCLE@e-Novia 15.11.2017
PPTX
What in the World is Going on at The Linux Foundation?
PPTX
Virtualization
PDF
Go Execution Tracer
PPTX
Server virtualization
PPTX
SDN Architecture & Ecosystem
PDF
In-depth forensic analysis of Windows registry files
PPTX
OpenFlow
PPTX
Network Virtualization
PDF
Deep dive into Coroutines on JVM @ KotlinConf 2017
PPTX
Introduction to OpenFlow, SDN and NFV
Walk through an enterprise Linux migration
Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)
[若渴計畫] Challenges and Solutions of Window Remote Shellcode
Graduating To Go - A Jumpstart into the Go Programming Language
Docker Networking
Scale Up with Lock-Free Algorithms @ JavaOne
Advanced memory allocation
DevRomagna / Golang Intro
Communication hardware
numPYNQ @ NGCLE@e-Novia 15.11.2017
What in the World is Going on at The Linux Foundation?
Virtualization
Go Execution Tracer
Server virtualization
SDN Architecture & Ecosystem
In-depth forensic analysis of Windows registry files
OpenFlow
Network Virtualization
Deep dive into Coroutines on JVM @ KotlinConf 2017
Introduction to OpenFlow, SDN and NFV
Ad

Similar to In-Memory Computing Essentials for Architects and Engineers (20)

PPTX
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
PDF
Nike tech-talk-intro-to-apache-ignite
PDF
OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabric
PDF
How we broke Apache Ignite by adding persistence, by Stephen Darlington (Grid...
PDF
“Building consistent and highly available distributed systems with Apache Ign...
PDF
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
PDF
Spark Summit EU talk by Christos Erotocritou
PDF
Machine learning and deep learning with Apache Ignite
PDF
Getting Started with Apache Ignite as a Distributed Database
PDF
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
PPTX
Deploying Distributed Databases and In-Memory Computing Platforms with Kubern...
PDF
Improving Apache Spark™ In-Memory Computing with Apache Ignite™
PPTX
IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Li...
PDF
Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...
PDF
Fast Data with Apache Ignite and Apache Spark with Christos Erotocritou
PDF
GridGain 6.0: Open Source In-Memory Computing Platform - Nikita Ivanov
PDF
IT Modernization in Practice
PDF
The next-phase-of-distributed-systems-with-apache-ignite
PDF
GridGain & Hadoop: Differences & Synergies
PDF
In memory computing principles by Mac Moore of GridGain
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
Nike tech-talk-intro-to-apache-ignite
OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabric
How we broke Apache Ignite by adding persistence, by Stephen Darlington (Grid...
“Building consistent and highly available distributed systems with Apache Ign...
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
Spark Summit EU talk by Christos Erotocritou
Machine learning and deep learning with Apache Ignite
Getting Started with Apache Ignite as a Distributed Database
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Deploying Distributed Databases and In-Memory Computing Platforms with Kubern...
Improving Apache Spark™ In-Memory Computing with Apache Ignite™
IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Li...
Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...
Fast Data with Apache Ignite and Apache Spark with Christos Erotocritou
GridGain 6.0: Open Source In-Memory Computing Platform - Nikita Ivanov
IT Modernization in Practice
The next-phase-of-distributed-systems-with-apache-ignite
GridGain & Hadoop: Differences & Synergies
In memory computing principles by Mac Moore of GridGain

Recently uploaded (20)

PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PPTX
Geodesy 1.pptx...............................................
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
Structs to JSON How Go Powers REST APIs.pdf
PDF
composite construction of structures.pdf
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
Digital Logic Computer Design lecture notes
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
DOCX
573137875-Attendance-Management-System-original
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
UNIT 4 Total Quality Management .pptx
PPT
Mechanical Engineering MATERIALS Selection
PPTX
web development for engineering and engineering
Strings in CPP - Strings in C++ are sequences of characters used to store and...
Geodesy 1.pptx...............................................
CYBER-CRIMES AND SECURITY A guide to understanding
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Structs to JSON How Go Powers REST APIs.pdf
composite construction of structures.pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Model Code of Practice - Construction Work - 21102022 .pdf
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Digital Logic Computer Design lecture notes
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
573137875-Attendance-Management-System-original
Arduino robotics embedded978-1-4302-3184-4.pdf
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
UNIT 4 Total Quality Management .pptx
Mechanical Engineering MATERIALS Selection
web development for engineering and engineering

In-Memory Computing Essentials for Architects and Engineers

  • 1. © 2017 GridGain Systems, Inc. In-Memory Performance Durability of Disk
  • 2. © 2017 GridGain Systems, Inc. In-Memory Computing Essentials for Java Developers Denis Magda Ignite PMC Chair GridGain Director of Product Management
  • 3. © 2017 GridGain Systems, Inc. • Apache Ignite Overview • Clustering and Deployment • Distributed Storage • Distributed SQL • Distributed Computations • Machine Learning • Memory Architecture & Persistence Agenda
  • 4. © 2017 GridGain Systems, Inc. Apache Ignite In-Memory Computing Platform Memory-Centric Storage Ignite Native Persistence (Flash, SSD, Intel 3D XPoint) Third-Party Persistence (RDBMS, HDFS, NoSQL) SQL Transactions Compute Services MLStreamingKey/Value IoTFinancial Services Pharma & Healthcare E-CommerceTravel & Logistics Telco
  • 5. © 2017 GridGain Systems, Inc. Clustering and Deployment
  • 6. © 2017 GridGain Systems, Inc. Clustering • Server Nodes • Act as containers for data and computations • Generally started as standalone processes • Client Nodes • Provide a cluster entry point to run operations • Embedded in applications code
  • 7. © 2017 GridGain Systems, Inc. Deployment • Nodes are logical entities • Runs in a JVM process • Many nodes in a single JVM process • On-Premise and Cloud • Physical server or VM • AWS, Azure, Google Compute Engine • Kubernetes, Mesos, YARN
  • 8. © 2017 GridGain Systems, Inc. Distributed Storage
  • 9. © 2017 GridGain Systems, Inc. Distributed Storage JCache Transactions Compute SQL RDBMS NoSQL HDFS Server Node Distributed Key-Value Store Dynamic Scaling Distributed partitioned hash map ACID TransactionJCache & SQL Server Node Server Node 3rd party storage caching DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
  • 10. © 2017 GridGain Systems, Inc. Where Entry Goes? Ignite Node 1 Ignite Node 2 put (key, value) ? ?
  • 11. © 2017 GridGain Systems, Inc. Key to Node Mapping Key Partition Server Node ON-DISK
  • 12. © 2017 GridGain Systems, Inc. Caches and Partitions K1, V1 K2, V2 K3, V3 K4, V4 Partition 1 K5, V5 K6, V6 K7,V7 K8, V8 K9, V9 Partition 2 Cache
  • 13. © 2017 GridGain Systems, Inc. Partitions Distribution Ignite Node 1 Ignite Node 2 0 2 4 6 8 10 12 14 1 3 5 7 9 11 13 15
  • 14. © 2017 GridGain Systems, Inc. Where Entry Goes? Ignite Node 1 Ignite Node 2 put (key, value) 0 2 4 1 3 5 ? ?
  • 15. © 2017 GridGain Systems, Inc. Where Entry Goes? Ignite Node 1 Ignite Node 2 put (key, value) 0 2 4 1 3 5
  • 16. © 2017 GridGain Systems, Inc. Backup Copies Ignite Node Ignite Node Ignite Node Ignite Node 0 1 2 3
  • 17. © 2017 GridGain Systems, Inc. Backup Copies Ignite Node Ignite Node Ignite Node Ignite Node 0 1 2 3 0 1 2 3
  • 18. © 2017 GridGain Systems, Inc. Distributed SQL
  • 19. © 2017 GridGain Systems, Inc. Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL, DML Support Cross-platform Compatibility Indexes in RAM or Disk Dynamic Scaling Server Node Server NodeServer Node Apache Ignite Cluster DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY Tools
  • 20. © 2017 GridGain Systems, Inc. Connectivity • JDBC • ODBC • REST • Java, .NET and C++ APIs // Register JDBC driver. Class.forName("org.apache.ignite.IgniteJdbcThinDriver"); // Open the JDBC connection. Connection conn = DriverManager.getConnection("jdbc:ignite:thin://192.168.0.50"); ./sqlline.sh --color=true --verbose=true -u jdbc:ignite:thin://127.0.0.1/
  • 21. © 2017 GridGain Systems, Inc. Data Definition Language • CREATE/DROP TABLE • CREATE/DROP INDEX • ALTER TABLE • Changes Durability • Ignite Native Persistence CREATE TABLE `city` ( `ID` INT(11), `Name` CHAR(35), `CountryCode` CHAR(3), `District` CHAR(20), `Population` INT(11), PRIMARY KEY (`ID`, `CountryCode`) ) WITH "template=partitioned, backups=1, affinityKey=CountryCode";
  • 22. © 2017 GridGain Systems, Inc. Data Manipulation Language • ANSI-99 specification • Fault-tolerant and consistent • INSERT, UPDATE, DELETE • SELECT • JOINs • Subqueries SELECT country.name, city.name, MAX(city.population) as max_pop FROM country JOIN city ON city.countrycode = country.code WHERE country.code IN ('USA','RUS','CHN') GROUP BY country.name, city.name ORDER BY max_pop DESC LIMIT 3;
  • 23. © 2017 GridGain Systems, Inc. Affinity Collocation Country Languag e City Server Node ON-DISK Server Node ON-DISK key (country = 5) 10 Partition key (cityId = 10, countryId = 5) 10 Partition key (cityId = 11, countryId = 9) 12 Partition
  • 24. © 2017 GridGain Systems, Inc. Collocated Joins 1. Initial Query 2. Query execution over local data 3. Reduce multiple results in one Ignite Node Canada Toronto Ottawa Montreal Calgary Ignite Node India Mumbai New Delhi 1 SELECT ct.name, c.name FROM Country as ct JOIN City as c ON ct.id = c.countryId WHERE ct.name = “Canada”; 2 23
  • 25. © 2017 GridGain Systems, Inc. Non-Collocated Joins 1. Initial Query 2. Query execution (local + remote data) 3. Potential data movement 4. Reduce multiple results in one Ignite Node Canad a Toronto Calgary 1 SELECT ct.name, c.name FROM Country as ct JOIN City as c ON ct.id = c.countryId WHERE ct.name = “Canada”; 2 24 Ignite Node India Montreal Ottawa 3 Montreal Ottawa Mumbai New Delhi
  • 26. © 2017 GridGain Systems, Inc. Distributed Computations
  • 27. © 2017 GridGain Systems, Inc. Compute Grid DURABLE MEMORY DURABLE MEMORY Ignite Cluster C1 R1 C2 R2 C = C1 + C2 R = R1 + R2 C = Compute R = Result in T/2 time Automatic Failover Load Balancing Zero Deployment
  • 28. © 2017 GridGain Systems, Inc. 1. Initial Request 2. Fetch data from remote nodes 3. Process entire data-set 3 1 Data 1 2 2 Data 2 Client-Server Processing Co-located Processing Server Node ON-DISK Server Node ON-DISK 1. Initial Request 2. Co-located processing with data 3. Reduce multiple results in one 2 2 1Client Node Server Node ON-DISK Server Node ON-DISK Client Node 3
  • 29. © 2017 GridGain Systems, Inc. Machine Learning
  • 30. © 2017 GridGain Systems, Inc. Genetic Algorithm Grid DURABLE MEMORY DURABLE MEMORY Ignite Cluster F2, C2, M2 F = F1 + F2 C = C1 + C2 Collocated Computation Biological Evolution Simulation Chromosome and Genes Cluster M = M1 + M2 F1, C1, M1 F = Fitness Calculation C = Crossover M = Mutation
  • 31. © 2017 GridGain Systems, Inc. Machine Learning Grid K-Means Regressions Decision Trees R C++ Python Java Server Node Server NodeServer Node Distributed Core Algebra DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY Scala REST Random Forest Distributed Algorithms Dense and Sparse Algebra Large Scale Parallelization Multi-Language Support Dense and Sparse Algebra No ETL
  • 32. © 2017 GridGain Systems, Inc. Memory Architecture & Persistence
  • 33. © 2017 GridGain Systems, Inc. Durable Memory Off-heap Removes noticeable GC pauses Automatic Defragmentation Stores Superset of Data Predictable memory consumption Fully Transactional (Write-Ahead Log) DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY Server Node Server Node Server Node Ignite Cluster Instantaneous Restarts
  • 34. © 2017 GridGain Systems, Inc.
  • 35. © 2017 GridGain Systems, Inc. Regions and Segments • Memory split into regions • Regions split into segments • Segments include pages
  • 36. © 2017 GridGain Systems, Inc. B+Tree • Self-balancing tree • Memory & Disk • Sorted Index • Secondary Indexes • Hash Index • Primary Keys • Hash code based sorting
  • 37. © 2017 GridGain Systems, Inc. Free Lists • Tracks pages of ~ equal free space • 25% free • 75% free • Essential for updates • Gives page with min size needed • Reduces fragmentation • Lowers pages compaction activity
  • 38. © 2017 GridGain Systems, Inc. Ignite Native Persistence 1. Update RAM 2. Persist Write-Ahead Log Partition File 1 3. Ack 4. Checkpointing Partition File N Server Node
  • 39. © 2017 GridGain Systems, Inc. Any Questions? Thank you for joining us. Follow the conversation. http://ignite.apache.org #apacheignite #denismagda

Editor's Notes

  • #5: The Apache Ignite Platform Apache Ignite is a memory-centric data platform that is used to build fast, scalable & resilient solutions. At the heart of the Apache Ignite platform lies a distributed memory-centric data storage platform with ACID semantics, and powerful processing APIs including SQL, Compute, Key/Value and transactions. Built with a memory-centric approach, this enables Apache Ignite to leverage memory for high throughput and low latency whilst utilising local disk or SSD to provide durability and fast recovery. The main difference between the memory-centric approach and the traditional disk-centric approach is that the memory is treated as a fully functional storage, not just as a caching layer, like most databases do. For example, Apache Ignite can function in a pure in-memory mode, in which case it can be treated as an In-Memory Database (IMDB) and In-Memory Data Grid (IMDG) in one. On the other hand, when persistence is turned on, Ignite begins to function as a memory-centric system where most of the processing happens in memory, but the data and indexes get persisted to disk. The main difference here from the traditional disk-centric RDBMS or NoSQL system is that Ignite is strongly consistent, horizontally scalable, and supports both SQL and key-value processing APIs. Apache Ignite platform can be integrated with third-party databases and external storage mediums and can be deployed on any infrastructure. It provides linear scalability, built-in fault tolerance, comprehensive security and auditing alongside advanced monitoring & management. The Apache Ignite platform caters for a range of use cases including: Core banking services, Real-time product pricing, reconciliation and risk calculation engines, analytics and machine learning.
  • #10: Ignite Data Grid is a distributed key-value store that enables storing data both in memory and on disk within distributed clusters and provides extensive APIs. Ignite Data Grid can be viewed as a distributed partitioned hash map with every cluster node owning a portion of the overall data. This way the more cluster nodes we add, the more data we can store.
  • #20: Apache Ignite incorporates distributed SQL database capabilities as a part of its platform. The database is horizontally scalable, fault tolerant and SQL ANSI-99 compliant. It supports all SQL, DDL, and DML commands including SELECT, UPDATE, INSERT, MERGE, and DELETE queries. It also provides support for a subset of DDL commands relevant for distributed databases. Data sets as well as indexes can be stored both in RAM and on disk thanks to the durable memory architecture. This allows executing distributed SQL operations across different memory layers achieving in-memory performance with durability of disk. You can interact with Apache Ignite using SQL language via natively developed APIs for Java, .NET and C++, or via the Ignite JDBC or ODBC drivers. This provides a true cross-platform connectivity from languages such as PHP, Ruby and more.
  • #28: Ignite In-Memory Compute Grid allows executing distributed computations in a parallel fashion to gain high performance, low latency, and linear scalability. Ignite compute grid provides a set of simple APIs that allow users distribute computations and data processing across multiple computers in the cluster. The disk-centric systems, like RDBMS or NoSQL, generally utilize the classic client-server approach, where the data is brought from the server to the client side where it gets processed and then is usually discarded. This approach does not scale well as moving the data over the network is the most expensive operation in a distributed system. A much more scalable approach is collocated processing that reverses the flow by bringing the computations to the servers where the data actually resides. This approach allows you to execute advanced logic or distributed SQL with JOINs exactly where the data is stored avoiding expensive serialization and network trips.
  • #29: https://ignite.apache.org/collocatedprocessing.html Collocation of computations with data allow for minimizing data serialization within network and can significantly improve performance and scalability of your application. Whenever possible, you should always make best effort to colocate your computations with the cluster nodes caching the data that needs to be processed. Let's assume that a blizzard is approaching New York City. You, as a telecommunication company has to warn all the people sending a message to everyone with precise instructions on how to behave during such weather conditions. There are around 8 million New Yorkers in your database that have to receive the text message. With the client-server approach the company has to connect to the database, move all 8 million (!) records from there to a client application that will text to everyone. This is highly inefficient that wastes network and computational resources of company's IT infrastructure. However, if the company initially collocates all the cities it covers with the people who live there then it can send a single computation (!) to the cluster node that stores information about all New Yorkers and send the text message from there. This approach avoids 8 million records movement over the network and helps utilizing cluster resources for computation needs. That's the collocated processing in action!
  • #31: https://github.com/techbysample/gagrid GA Grid (Beta) is an in memory Genetic Algorithm (GA) component for Apache Ignite. A GA is a method of solving optimization problems by simulating the process of biological evolution. GA Grid provides a distributive GA library built on top of a mature and scalable Apache Ignite platform. GAs are excellent for searching through large and complex data sets for an optimal solution. Real world applications of GAs include: automotive design, computer gaming, robotics, investments, traffic/shipment routing and more. Glossary Chromosome is a sequence of Genes. A Chromosome represents a potential solution. Crossover is the process in which the genes within chromosomes are combined to derive new chromosomes. Fitness Score is a numerical score that measures the value of a particular Chromosome (ie: solution) relative to other Chromosome in the population. Gene is the discrete building blocks that make up the Chromosome. Genetic Algorithm (GA) is a method of solving optimization problems by simulating the process of biological evolution. A GA continuously enhances a population of potential solutions. With each iteration, a GA selects the 'best fit' individuals from the current population to create offspring for the next generation. After subsequent generations, a GA will "evolve" the population toward an optimal solution. Mutation is the process where genes within a chromosomes are randomly updated to produce new characteristics. Population is the collection of potential solutions or Chromosomes. Selection is the process of choosing candidate solutions (Chromosomes) for the next generation.
  • #32: DEMO: run several ML samples from the standard distribution. Main benefits: No ETL – online “in place” ML In-memory speed & scale Large scale parallelization Optimized ML/DL algorithms Last-mile GPU optimization The rationale for building ML Grid is quite simple. Many users employ Ignite as the central high-performance storage and processing systems for various data sets. If they wanted to perform ML or Deep Learning (DL) on these data sets (i.e training sets or model inference) they had to ETL them first into some other systems like Apache Mahout or Apache Spark. The roadmap for ML Grid is to start with core algebra implementation based on Ignite co-located distributed processing. The initial version was released with Ignite 2.0. Future releases will introduce custom DSLs for Python, R and Scala, growing collection of optimized ML algorithms such as Linear and Logistic Regression, Decision Tree/Random Forest, SVM, Naive Bayes, as well support for Ignite-optimized Neural Networks and integration with TensorFlow. Current beta version of Apache Ignite Machine Learning Grid (ML Grid) supports a distributed machine learning library built on top of highly optimized and scalable Apache Ignite platform and implements local and distributed vector and matrix algebra operations as well as distributed versions of widely used algorithms.
  • #34: Apache Ignite memory-centric platform is based on the Durable Memory architecture that allows storing and processing data and indexes both in memory and on disk when the Ignite Persistent Store feature is enabled. The memory architecture helps achieve in-memory performance with durability of disk using all the available resources of the cluster. Ignite's durable memory is built and operates in a way similar to the Virtual Memory of operating systems such as Linux. However, one significant difference between these two types of architectures is that Durable Memory always keeps the whole data set and indexes on disk if the Ignite Persistent Store is used, while Virtual Memory uses the disk for swapping purposes only. In-Memory • Off-Heap memory • Removes noticeable GC pauses • Automatic Defragmentation • Predictable memory consumption • Boosts SQL performance On Disk • Optional Persistence • Support of flash, SSD, Intel 3D Xpoint • Stores superset of data • Fully Transactional ◦ Write-Ahead-Log (WAL) • Instantaneous Cluster Restarts
  • #35: Ignite Native Persistence is a distributed ACID and SQL-compliant disk store that transparently integrates with Ignite's Durable Memory as an optional disk layer storing data and indexes on SSD, Flash, 3D XPoint, and other types of non-volatile storages. With the Ignite Persistence enabled, you no longer need to keep all the data and indexes in memory or warm it up after a node or cluster restart because the Durable Memory is tightly coupled with persistence and treats it as a secondary memory tier. This implies that if a subset of data or an index is missing in RAM, the Durable Memory will take it from the disk.
  • #37: B-tree is a self-balancing tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. B+Tree is a central part of the whole Ignite Virtual memory architecture because even basic key-value operations work via it (cache.get and cache.put)! Move to the next slide.
  • #38: On the previous slide we explained how to look up a value inside of the virtual memory. However, how does the virtual memory know where to put a new value? In fact, Ignite uses a special data structure called Free List to support this. Basically, a free list is a doubly linked list that stores references to pages of approximately equal free space. For instance, there is a free list that stores all the data pages that have up to 75% free space and a list that keeps track of the index pages with 25% capacity left. Data and index pages are tracked in separate free lists.
  • #39: Ignite Native Persistence is a distributed ACID and SQL-compliant disk store that transparently integrates with Ignite's Durable Memory as an optional disk layer storing data and indexes on SSD, Flash, 3D XPoint, and other types of non-volatile storages. With the Ignite Persistence enabled, you no longer need to keep all the data and indexes in memory or warm it up after a node or cluster restart because the Durable Memory is tightly coupled with persistence and treats it as a secondary memory tier. This implies that if a subset of data or an index is missing in RAM, the Durable Memory will take it from the disk.