SlideShare a Scribd company logo
© 2017 GridGain Systems, Inc.
In-Memory Hammer for Your Data Science Toolkit
Apache Ignite
Denis Magda
Ignite PMC Chair
Apache Ignite PM
© 2017 GridGain Systems, Inc.
• Apache Ignite Overview
• Use Cases
• Data Science Toolkit Box
• Data Grid
• Durable Memory
• Distributed SQL
• Compute Grid
• Machine Learning Grid (Beta)
• Q&A
Agenda
© 2017 GridGain Systems, Inc.
Apache Ignite In-Memory Computing Platform
Memory-Centric Storage
Ignite Native Persistence
(Flash, SSD, Intel 3D XPoint)
Third-Party Persistence
(RDBMS, HDFS, NoSQL)
SQL Transactions Compute Services MLStreaming
Applications
Key/Value
IoTFinancial
Services
Pharma &
Healthcare
E-CommerceTravel &
Logistics
Telco
© 2017 GridGain Systems, Inc.
Apache Ignite Use Cases
FinTech
Financial Services Software Logistics & Travel
E-commerce
Telco
IoT
Pharma & Healthcare
Adtech
© 2017 GridGain Systems, Inc.
e-Therapeutics provides a computer-based drug
discovery platform and a specialized approach to
network biology.
Problem
• Analysis of a network of proteins influencing a disease and
drugs discovery could be measured in weeks
• Could not parallelize existing algorithms

Apache Ignite Solution
• 80x speed increase over the non-parallelized environment
• Analysis projects completion in hours and minutes
• Computational resources for abandoned research projects
- Drug Discovery and Network Biology
Cache &
Compute
API
e-Therapeutics
Platform
100x Cluster Nodes
5x Physical Nodes
Server Nodes
Clients
Nodes
© 2017 GridGain Systems, Inc.
Data Grid
JCache Transactions Compute SQL
Server Node
Distributed Key-Value Store
Dynamic
Scaling
Distributed
partitioned
hash map
ACID TransactionJCache & SQL
Server Node Server Node
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
RDBMS
NoSQL
HDFS
3rd party storage caching
© 2017 GridGain Systems, Inc.
Durable Memory
Off-heap Removes
noticeable GC pauses
Automatic
Defragmentation
Stores Superset
of Data
Predictable memory
consumption
Fully Transactional
(Write-Ahead Log)
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
Server Node Server Node Server Node
Ignite Cluster
© 2017 GridGain Systems, Inc.
Ignite Native Persistence
1. Update
RAM
2. Persist
Write-Ahead Log
Partition File 1
2. Ack
4. Checkpointing
Partition File N
Server Node
© 2017 GridGain Systems, Inc.
Distributed SQL
JDBC ODBC SQL API
Java .NET C++ BI
SELECT, UPDATE,
INSERT, MERGE,
DELETE, CREATE
and ALTER
DDL, DML Support
Cross-platform
Compatibility
Indexes in
RAM or Disk
Dynamic
Scaling
Server Node Server NodeServer Node
Apache Ignite Cluster
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
Tools
© 2017 GridGain Systems, Inc.
1. Initial Query
2. Query execution over local data
3. Reduce multiple results in one
1. Initial Query
2. Query execution (local + remote data)
3. Potential data movement
4. Reduce multiple results in one
2
2
1
Collocated Joins Non-Collocated Joins
Server Node
ON-DISK
Server Node
ON-DISK
Client Node
3
2
2
1
Server Node
ON-DISK
Server Node
ON-DISK
Client Node
4
3
© 2017 GridGain Systems, Inc.
Compute Grid
DURABLE MEMORY
DURABLE MEMORY
Ignite Cluster
C1
R1
C2
R2
C = C1 + C2
R = R1 + R2
C = Compute
R = Result
in T/2 time
Automatic Failover
Load Balancing
Zero Deployment
© 2017 GridGain Systems, Inc.
1. Initial Request
2. Fetch data from remote nodes
3. Process entire data-set
1. Initial Request
2. Co-located processing with data
3. Reduce multiple results in one
3
1
Data 1
2
2 Data 2
2
2
1Client Node
Client-Server Processing Co-located Processing
Server Node
ON-DISK
Server Node
ON-DISK
Server Node
ON-DISK
Server Node
ON-DISK
Client Node
3
© 2017 GridGain Systems, Inc.
Genetic Algorithms Grid
DURABLE MEMORY
DURABLE MEMORY
Ignite Cluster
F2, C2, M2
F = F1 + F2
C = C1 + C2
Collocated Computation
Biological Evolution
Simulation
Chromosome and Genes Cluster
M = M1 + M2
F1, C1, M1
F = Fitness Calculation
C = Crossover
M = Mutation
© 2017 GridGain Systems, Inc.
Machine Learning Grid
K-Means Regressions Decision Trees
R C++ Python Java
Server Node Server NodeServer Node
Distributed Core Algebra
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
Scala REST
Random ForestDistributed Algorithms
Dense and Sparse
Algebra
Large scale
parallelization
Multi-Language
Support
No ETL
© 2017 GridGain Systems, Inc.
Thank you for joining us. Follow the conversation.
http://ignite.apache.org
Any Questions?
#apacheignite
#denismagda

More Related Content

PPTX
Microservices Architectures With Apache Ignite
PDF
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
PPTX
Apache Ignite - Distributed SQL Database Capabilities
PDF
The next-phase-of-distributed-systems-with-apache-ignite
PPTX
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
PPTX
Distributed Database DevOps Dilemmas? Kubernetes to the Rescue
PPTX
In-Memory Computing Essentials for Architects and Engineers
PDF
Apache Ignite - Distributed Database Orchestration
Microservices Architectures With Apache Ignite
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Ignite - Distributed SQL Database Capabilities
The next-phase-of-distributed-systems-with-apache-ignite
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
Distributed Database DevOps Dilemmas? Kubernetes to the Rescue
In-Memory Computing Essentials for Architects and Engineers
Apache Ignite - Distributed Database Orchestration

What's hot (20)

PPTX
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT
PDF
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
PPTX
IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...
PPTX
In-Memory Computing Essentials for Software Engineers
PDF
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
PPTX
Continuous Machine and Deep Learning with Apache Ignite
PPTX
Build Big Data Enterprise solutions faster on Azure HDInsight
PPTX
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
PDF
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
PDF
In-Memory Computing Essentials
POTX
Addressing Enterprise Customer Pain Points with a Data Driven Architecture
PPTX
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
PPTX
Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
PPTX
Securing your Big Data Environments in the Cloud
PDF
Spark + Flashblade: Spark Summit East talk by Brian Gold
PDF
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
PPTX
10 Things About Spark
PPTX
Hadoop Hadoop & Spark meetup - Altiscale
PDF
Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East t...
PPTX
Built-In Security for the Cloud
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
IMCSummite 2016 Breakout - Nikita Ivanov - Apache Ignite 2.0 Towards a Conver...
In-Memory Computing Essentials for Software Engineers
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Continuous Machine and Deep Learning with Apache Ignite
Build Big Data Enterprise solutions faster on Azure HDInsight
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
In-Memory Computing Essentials
Addressing Enterprise Customer Pain Points with a Data Driven Architecture
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
Securing your Big Data Environments in the Cloud
Spark + Flashblade: Spark Summit East talk by Brian Gold
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
10 Things About Spark
Hadoop Hadoop & Spark meetup - Altiscale
Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East t...
Built-In Security for the Cloud
Ad

Similar to Apache Ignite: In-Memory Hammer for Your Data Science Toolkit (20)

PDF
Nike tech-talk-intro-to-apache-ignite
PDF
OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabric
PDF
Spark Summit EU talk by Christos Erotocritou
PDF
Machine learning and deep learning with Apache Ignite
PDF
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
PPTX
How we broke Apache Ignite by adding persistence
PDF
How we broke Apache Ignite by adding persistence, by Stephen Darlington (Grid...
PPTX
IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Li...
PDF
“Building consistent and highly available distributed systems with Apache Ign...
PDF
GridGain 6.0: Open Source In-Memory Computing Platform - Nikita Ivanov
PDF
Getting Started with Apache Ignite as a Distributed Database
PDF
Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...
PDF
IT Modernization in Practice
PDF
Fast Data with Apache Ignite and Apache Spark with Christos Erotocritou
PDF
Improving Apache Spark™ In-Memory Computing with Apache Ignite™
PDF
In memory computing principles by Mac Moore of GridGain
PDF
GridGain & Hadoop: Differences & Synergies
PPTX
Deploying Distributed Databases and In-Memory Computing Platforms with Kubern...
PDF
In-Memory Data Grids: Explained...
PDF
Apache Ignite
Nike tech-talk-intro-to-apache-ignite
OSDC 2017 - Christos Erotocritou - Apache ignite in-memory data fabric
Spark Summit EU talk by Christos Erotocritou
Machine learning and deep learning with Apache Ignite
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
How we broke Apache Ignite by adding persistence
How we broke Apache Ignite by adding persistence, by Stephen Darlington (Grid...
IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Li...
“Building consistent and highly available distributed systems with Apache Ign...
GridGain 6.0: Open Source In-Memory Computing Platform - Nikita Ivanov
Getting Started with Apache Ignite as a Distributed Database
Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...
IT Modernization in Practice
Fast Data with Apache Ignite and Apache Spark with Christos Erotocritou
Improving Apache Spark™ In-Memory Computing with Apache Ignite™
In memory computing principles by Mac Moore of GridGain
GridGain & Hadoop: Differences & Synergies
Deploying Distributed Databases and In-Memory Computing Platforms with Kubern...
In-Memory Data Grids: Explained...
Apache Ignite
Ad

Recently uploaded (20)

PDF
top salesforce developer skills in 2025.pdf
PDF
Digital Strategies for Manufacturing Companies
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PPTX
Introduction to Artificial Intelligence
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
L1 - Introduction to python Backend.pptx
PPTX
ai tools demonstartion for schools and inter college
PDF
System and Network Administration Chapter 2
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPT
Introduction Database Management System for Course Database
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Nekopoi APK 2025 free lastest update
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
Online Work Permit System for Fast Permit Processing
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
top salesforce developer skills in 2025.pdf
Digital Strategies for Manufacturing Companies
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Odoo POS Development Services by CandidRoot Solutions
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Introduction to Artificial Intelligence
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
L1 - Introduction to python Backend.pptx
ai tools demonstartion for schools and inter college
System and Network Administration Chapter 2
Upgrade and Innovation Strategies for SAP ERP Customers
Introduction Database Management System for Course Database
Navsoft: AI-Powered Business Solutions & Custom Software Development
CHAPTER 2 - PM Management and IT Context
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Nekopoi APK 2025 free lastest update
Wondershare Filmora 15 Crack With Activation Key [2025
Online Work Permit System for Fast Permit Processing
2025 Textile ERP Trends: SAP, Odoo & Oracle

Apache Ignite: In-Memory Hammer for Your Data Science Toolkit

  • 1. © 2017 GridGain Systems, Inc. In-Memory Hammer for Your Data Science Toolkit Apache Ignite Denis Magda Ignite PMC Chair Apache Ignite PM
  • 2. © 2017 GridGain Systems, Inc. • Apache Ignite Overview • Use Cases • Data Science Toolkit Box • Data Grid • Durable Memory • Distributed SQL • Compute Grid • Machine Learning Grid (Beta) • Q&A Agenda
  • 3. © 2017 GridGain Systems, Inc. Apache Ignite In-Memory Computing Platform Memory-Centric Storage Ignite Native Persistence (Flash, SSD, Intel 3D XPoint) Third-Party Persistence (RDBMS, HDFS, NoSQL) SQL Transactions Compute Services MLStreaming Applications Key/Value IoTFinancial Services Pharma & Healthcare E-CommerceTravel & Logistics Telco
  • 4. © 2017 GridGain Systems, Inc. Apache Ignite Use Cases FinTech Financial Services Software Logistics & Travel E-commerce Telco IoT Pharma & Healthcare Adtech
  • 5. © 2017 GridGain Systems, Inc. e-Therapeutics provides a computer-based drug discovery platform and a specialized approach to network biology. Problem • Analysis of a network of proteins influencing a disease and drugs discovery could be measured in weeks • Could not parallelize existing algorithms
 Apache Ignite Solution • 80x speed increase over the non-parallelized environment • Analysis projects completion in hours and minutes • Computational resources for abandoned research projects - Drug Discovery and Network Biology Cache & Compute API e-Therapeutics Platform 100x Cluster Nodes 5x Physical Nodes Server Nodes Clients Nodes
  • 6. © 2017 GridGain Systems, Inc. Data Grid JCache Transactions Compute SQL Server Node Distributed Key-Value Store Dynamic Scaling Distributed partitioned hash map ACID TransactionJCache & SQL Server Node Server Node DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY RDBMS NoSQL HDFS 3rd party storage caching
  • 7. © 2017 GridGain Systems, Inc. Durable Memory Off-heap Removes noticeable GC pauses Automatic Defragmentation Stores Superset of Data Predictable memory consumption Fully Transactional (Write-Ahead Log) DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY Server Node Server Node Server Node Ignite Cluster
  • 8. © 2017 GridGain Systems, Inc. Ignite Native Persistence 1. Update RAM 2. Persist Write-Ahead Log Partition File 1 2. Ack 4. Checkpointing Partition File N Server Node
  • 9. © 2017 GridGain Systems, Inc. Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL, DML Support Cross-platform Compatibility Indexes in RAM or Disk Dynamic Scaling Server Node Server NodeServer Node Apache Ignite Cluster DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY Tools
  • 10. © 2017 GridGain Systems, Inc. 1. Initial Query 2. Query execution over local data 3. Reduce multiple results in one 1. Initial Query 2. Query execution (local + remote data) 3. Potential data movement 4. Reduce multiple results in one 2 2 1 Collocated Joins Non-Collocated Joins Server Node ON-DISK Server Node ON-DISK Client Node 3 2 2 1 Server Node ON-DISK Server Node ON-DISK Client Node 4 3
  • 11. © 2017 GridGain Systems, Inc. Compute Grid DURABLE MEMORY DURABLE MEMORY Ignite Cluster C1 R1 C2 R2 C = C1 + C2 R = R1 + R2 C = Compute R = Result in T/2 time Automatic Failover Load Balancing Zero Deployment
  • 12. © 2017 GridGain Systems, Inc. 1. Initial Request 2. Fetch data from remote nodes 3. Process entire data-set 1. Initial Request 2. Co-located processing with data 3. Reduce multiple results in one 3 1 Data 1 2 2 Data 2 2 2 1Client Node Client-Server Processing Co-located Processing Server Node ON-DISK Server Node ON-DISK Server Node ON-DISK Server Node ON-DISK Client Node 3
  • 13. © 2017 GridGain Systems, Inc. Genetic Algorithms Grid DURABLE MEMORY DURABLE MEMORY Ignite Cluster F2, C2, M2 F = F1 + F2 C = C1 + C2 Collocated Computation Biological Evolution Simulation Chromosome and Genes Cluster M = M1 + M2 F1, C1, M1 F = Fitness Calculation C = Crossover M = Mutation
  • 14. © 2017 GridGain Systems, Inc. Machine Learning Grid K-Means Regressions Decision Trees R C++ Python Java Server Node Server NodeServer Node Distributed Core Algebra DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY Scala REST Random ForestDistributed Algorithms Dense and Sparse Algebra Large scale parallelization Multi-Language Support No ETL
  • 15. © 2017 GridGain Systems, Inc. Thank you for joining us. Follow the conversation. http://ignite.apache.org Any Questions? #apacheignite #denismagda