SlideShare a Scribd company logo
© 20190 TigerGraph. All Rights Reserved
Graph Gurus 25
Unleash the Business Value of Your Data Lake
with Graph Analytics
1
© 2019 TigerGraph. All Rights Reserved
Today’s Presenter
2
David Schexnayder
Senior Sales Engineer
● 20+ years in tech industry
● 7+ years with Hadoop / Big Data
● Bachelor of Science from North Carolina State
University
© 2019 TigerGraph. All Rights Reserved
Some Housekeeping Items
● Although your phone is muted we do want to answer your questions -
submit your questions at any time using the Q&A tab in the menu
● The webinar is being recorded and will uploaded to our website shortly
(https://www.tigergraph.com/webinars-and-events/) and the URL will be
emailed you
● If you have issues with Zoom please contact the panelists via chat
3
© 2019 TigerGraph. All Rights Reserved
Our Agenda
4
Introduction1
4
3
2 Data Lakes - Characteristics and Challenges
Why Graph DB and Analytics
Why TigerGraph
5 How TigerGraph Fits
6 Resources
© 2019 TigerGraph. All Rights Reserved
Relational Database Key-Value Database Graph Database
Customer
XXXXXX
Product
XXXXXXXX
Supplier
XXXXXXXX
Location
XXXXXXXX
Order
XXXXXXXXX
Product
Customer
Supplier
Location
KEY VALUE
XXXXX
Order
Customer
Produc
t
Supplier
• Rigid schema
• High performance for transactions
• Poor performance for deep analytics
• Highly fluid schema/no schema
• High performance for simple transactions
• Poor performance deep analytics
• Flexible schema
• High performance for complex transactions
• High performance for deep analytics
Location 1 = Delivery Location
Location 2 = Warehouse
Location 2
Product
Payment
PURCHASED
RESIDES
SHIPSTO
PURCHASED
SHIPS FROM
A
C
C
EPTED
MAKES
The Evolution of Databases
XXXXX
XXXXX
XXXXX Location 1
N
O
TIFIES
5
© 2019 TigerGraph. All Rights Reserved
Characteristics of Data Lakes
Up to Petabytes of data from diverse sources
• Non-relational and relational data
• Web logs
• Social media feeds
• Collected data
Multiple engines to process and analyze data
• Spark
• Hive
• Impala
Data stored in a variety of formats
• NoSQL Databases (HBase, Cassandra)
• Columnar Storage (Parquet, ORC)
• Flat files, sequence files
6
© 2019 TigerGraph. All Rights Reserved
Common Components of Data Lakes
7
© 2019 TigerGraph. All Rights Reserved
Challenges of Data Lakes
● Data sprawl and duplication
● Gaining real-time insights very difficult
● Security and data governance
● Table joins are still expensive!
8
© 2019 TigerGraph. All Rights Reserved
Graph to the Rescue
TigerGraph provides the ability for businesses to put the data
in your lake to work:
• Faster insights into your connected data
• Hybrid Transactional and Analytical Processing (HTAP)
• Connect multiple data sources expressed in graph
• Distributed efficient storage and parallel processing
9
© 2019 TigerGraph. All Rights Reserved
Graph Can Accelerate Your Use Cases
TigerGraph can run optimized use cases to put the data in
your lake to work:
• Fraud and Anti Money Laundering
• Next-generation recommendation engines
• Customer journey / Customer 360
• Master Data Management
• Entity Resolution
• IT Optimization
10
© 2019 TigerGraph. All Rights Reserved
Blocking Fraud Before Payment Is Made
● 30% increase in fraud detection
● Integrated 20+ data sources,
handling 2 billion real-time
events per day, 1 trillion edges
● Distributed across 40 standard
servers (20 with 2x replication)
● Reduced working cycle from
one hour to minutes
● Business logic implemented in
10k of C++ lines
11
© 2019 TigerGraph. All Rights Reserved
Increase Revenues with Recommendations
Person A Person C
Shops at Store Y
Bought item X online
from Store Y
Knows/likes/follows B B commented (to C)
Person B
C (also) shops at store Y
Is located at Z
Item X is stored at
Warehouse W
Item X has
tag/feature F
Customer
Visit Click Path
Would you like…?
B likes hobby H
12
© 2019 TigerGraph. All Rights Reserved
Building a Customer 360° Data Hub
Business Challenge
• Combine all available data for customer with
transactions to improve business outcomes
Solution
• Build on top of current investments in master data
management, data warehouse/Hadoop data lake
and NoSQL repositories
• Find new relationships among data to drive better
fraud and money laundering detection, credit risk
scoring and monitoring, product & service
marketing, cross-sell and up-sell recommendation
for higher revenue & profits
• Analyze temporal (Time Series) and spatial data to
find new patterns and insights
• Expand schema (attributes/fields, relationships) to
accommodate new data sources & use cases
13
© 2019 TigerGraph. All Rights Reserved
Entity Resolution for MDM with TigerGraph
14
© 2019 TigerGraph. All Rights Reserved
Metadata Management with Graph
● Metadata is “Data about other Data”
● Types of Metadata
○ Structure - Organizational units, operations, IT infrastructure, stages/tates, data
collections, database schema
○ Processes - business, operational, software, data processing
○ Process flow = directional (“Directed”) graph
● Uses
○ Establish and manage policies and processes
○ Manage data lineage, integration and sharing
○ Ensure compliance (financial, GDPR, HIPAA, etc.)
○ Security, access control
○ Identity and eliminate redundancies and inefficiencies
15
© 2019 TigerGraph. All Rights Reserved
Improved Energy Management System
• Business Challenge: Monitor complex energy
infrastructure to detect and manage power
overloads and outages
• Solution:
• Model power system using real-time
operational Graph to accelerate state
estimation & power flow calculation
(no data preparation needed)
• Leverage massively parallel computing
in Graph for bus ordering &
admittance graph forming to balance
load spikes
• Visualize energy computation results in
Graph for contingency analysis and
action plan
• Business Value: Deliver faster than real-time
EMS (ie, EMS capable of completing
execution within a SCADA sample cycle of 5
seconds)
16
© 2019 TigerGraph. All Rights Reserved
Delivering Better Outcomes
Customers
Supplier
Employee
Device
IoT Signals
Orders
Payments
Shipments
Invoices
Visits
Downloads
Master Data
Operational Data
Data
Warehouse
Data Mart Data Lake NoSQL
Historical
Data
Queries / Lookups,
Comprehensive Graph Patterns
and Algorithms
Graph-Computed
Features
Batch and Streaming
17
© 2019 TigerGraph. All Rights Reserved
Hadoop Interoperability
Several ingestion tools to integrate Your data lake:
● Spark via JDBC
● HDFS Connector
● S3 Connector
● JDBC from RDBMS
18
© 2019 TigerGraph. All Rights Reserved
Testing Setup and Configuration
• Cloudera Quickstart VM 5.13.0 and example data
• TigerGraph Developer Edition 2.5.0
19
© 2019 TigerGraph. All Rights Reserved
Hadoop Workflow
• Use sqoop to pull transactional data from RDBMS
• Build Hive tables and store on HDFS as parquet
• Query data with Impala, Hive, Spark
20
© 2019 TigerGraph. All Rights Reserved
Spark and TigerGraph Data Pipeline
Static
Data
Sources
TigerGraph
JDBC
Driver
Streaming
Data
Sources
© 2019 TigerGraph. All Rights Reserved
JDBC Driver (1.2)
• Type 4 driver
• Support Reads and Writes bi-directional data flow to
TigerGraph
• Read: Converts ResultSet to DataFrame
• Write: Load DataFrame and files to vertex/edge in
TigerGraph
• Supports REST endpoints of built-in, compiled and
interpreted GSQL queries
• Open Source:
https://github.com/tigergraph/ecosys/tree/master/etl/tg-jdbc-driver
22
© 2019 TigerGraph. All Rights Reserved
JDBC Driver Installation on Cloudera
• Upgrade Java to 1.8.0 openJDK
• Convert to parcels
• Install Spark 2.4.0 parcel (optional)
• Build the TigerGraph JDBC driver (from github)
• Install the JDBC driver and add it to ‘classpath.txt’
23
© 2019 TigerGraph. All Rights Reserved
Relational Data Schema
24
© 2019 TigerGraph. All Rights Reserved
Graph Data Schema
25
© 2019 TigerGraph. All Rights Reserved
Hadoop —> TigerGraph Workflow
• Use Apache Spark to read the data on HDFS in parquet
format
• Use the TigerGraph Spark JDBC connector to load the data
into the graph
• Explore the graph
26
© 2019 TigerGraph. All Rights Reserved
Demo!
27
© 2019 TigerGraph. All Rights Reserved
What Makes TigerGraph Fast and Scalable?
Designed from ground up for computational parallelism
• Native graph storage
• Parallel graph computation engine
• Deep link analytics in real time
• 10X+ compression
• Distributed graph for scaleout using multiple machines
28
Q&A
Please submit your questions via the Q&A tab in Zoom
© 2019 TigerGraph. All Rights Reserved
Additional Learning Resources
Start Free at TigerGraph Cloud Today!
https://www.tigergraph.com/cloud/
Test Drive Online Demo
https://www.tigergraph.com/demo
Download the Developer Edition
https://www.tigergraph.com/download/
Guru Scripts
https://github.com/tigergraph/ecosys/tree/master/guru_scripts
Join our Developer Forum
https://groups.google.com/a/opengsql.org/forum/#!forum/gsql-users
30
© 2019 TigerGraph. All Rights Reserved
Coming to a City Near You
Let us know if you would like to help organize a Graph Gurus
Comes To You workshop in your city
https://info.tigergraph.com/graph-gurus-request
31
© 20190 TigerGraph. All Rights Reserved
Thank You

More Related Content

PDF
Graph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AI
PDF
Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...
PDF
Graph Gurus Episode 17: Seven Key Data Science Capabilities Powered by a Nati...
PDF
Graph Databases and Machine Learning | November 2018
PDF
Using Graph Algorithms for Advanced Analytics - Part 2 Centrality
PPTX
Graph Gurus Episode 35: No Code Graph Analytics to Get Insights from Petabyte...
PDF
Graph Gurus Episode 9: How Visa Optimizes Network and IT Resources with a Nat...
PDF
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
Graph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AI
Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...
Graph Gurus Episode 17: Seven Key Data Science Capabilities Powered by a Nati...
Graph Databases and Machine Learning | November 2018
Using Graph Algorithms for Advanced Analytics - Part 2 Centrality
Graph Gurus Episode 35: No Code Graph Analytics to Get Insights from Petabyte...
Graph Gurus Episode 9: How Visa Optimizes Network and IT Resources with a Nat...
Graph Hardware Architecture - Enterprise graphs deserve great hardware!

What's hot (20)

PDF
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
PDF
Fast Parallel Similarity Calculations with FPGA Hardware
PDF
Graph Gurus Episode 1: Enterprise Graph
PDF
Graph Gurus Episode 28: In-Database Machine Learning Solution for Real-Time R...
PDF
Graph Gurus 15: Introducing TigerGraph 2.4
PDF
Graph Gurus Episode 19: Deep Learning Implemented by GSQL on a Native Paralle...
PDF
Using Graph Algorithms for Advanced Analytics - Part 5 Classification
PDF
Graph Gurus Episode 8: Location, Location, Location - Geospatial Analysis wit...
PDF
Graph Gurus Episode 3: Anti Fraud and AML Part 1
PDF
Plume - A Code Property Graph Extraction and Analysis Library
PDF
Graph Gurus Episode 5: Webinar PageRank
PDF
Graph Gurus Episode 11: Accumulators for Complex Graph Analytics
PPTX
Graphs for AI & ML, Jim Webber, Neo4j
PDF
Deploying an End-to-End TigerGraph Enterprise Architecture using Kafka, Maria...
PDF
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
PPTX
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
PPTX
ML Workshop 1: A New Architecture for Machine Learning Logistics
PDF
Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...
PDF
Bigdata Machine Learning Platform
PDF
Neo4j Graph Data Science - Webinar
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Fast Parallel Similarity Calculations with FPGA Hardware
Graph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 28: In-Database Machine Learning Solution for Real-Time R...
Graph Gurus 15: Introducing TigerGraph 2.4
Graph Gurus Episode 19: Deep Learning Implemented by GSQL on a Native Paralle...
Using Graph Algorithms for Advanced Analytics - Part 5 Classification
Graph Gurus Episode 8: Location, Location, Location - Geospatial Analysis wit...
Graph Gurus Episode 3: Anti Fraud and AML Part 1
Plume - A Code Property Graph Extraction and Analysis Library
Graph Gurus Episode 5: Webinar PageRank
Graph Gurus Episode 11: Accumulators for Complex Graph Analytics
Graphs for AI & ML, Jim Webber, Neo4j
Deploying an End-to-End TigerGraph Enterprise Architecture using Kafka, Maria...
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
ML Workshop 1: A New Architecture for Machine Learning Logistics
Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...
Bigdata Machine Learning Platform
Neo4j Graph Data Science - Webinar
Ad

Similar to Graph Gurus Episode 25: Unleash the Business Value of Your Data Lake with Graph Analytics (20)

PPTX
Tiger graph 2021 corporate overview [read only]
PDF
Better Together: How Graph database enables easy data integration with Spark ...
PPTX
Comparing three data ingestion approaches where Apache Kafka integrates with ...
PDF
How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...
PDF
Graph Gurus 24: How to Build Innovative Applications with TigerGraph Cloud
PDF
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
PDF
Graph+AI for Fin. Services
PDF
Shift Remote: AI: Smarter AI with analytical graph databases - Victor Lee (Ti...
PDF
Graph Gurus Episode 12: Tiger Graph v2.3 Overview
PDF
TigerGraph UI Toolkits Financial Crimes
PDF
Maximize the Value of Your Data: Neo4j Graph Data Platform
PDF
An Introduction to Graph: Database, Analytics, and Cloud Services
PPTX
The year of the graph: do you really need a graph database? How do you choose...
PDF
Machine Learning Feature Design with TigerGraph 3.0 No-Code GUI
PDF
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
PPTX
Tools and Methods for Big Data Analytics by Dahl Winters
PPTX
Tools and Methods for Big Data Analytics by Dahl Winters
PPTX
Data stax webinar cassandra and titandb insights into datastax graph strategy...
PDF
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
PPTX
Stratebi Big Data
Tiger graph 2021 corporate overview [read only]
Better Together: How Graph database enables easy data integration with Spark ...
Comparing three data ingestion approaches where Apache Kafka integrates with ...
How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...
Graph Gurus 24: How to Build Innovative Applications with TigerGraph Cloud
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Graph+AI for Fin. Services
Shift Remote: AI: Smarter AI with analytical graph databases - Victor Lee (Ti...
Graph Gurus Episode 12: Tiger Graph v2.3 Overview
TigerGraph UI Toolkits Financial Crimes
Maximize the Value of Your Data: Neo4j Graph Data Platform
An Introduction to Graph: Database, Analytics, and Cloud Services
The year of the graph: do you really need a graph database? How do you choose...
Machine Learning Feature Design with TigerGraph 3.0 No-Code GUI
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
Tools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl Winters
Data stax webinar cassandra and titandb insights into datastax graph strategy...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Stratebi Big Data
Ad

More from TigerGraph (20)

PDF
MAXIMIZING THE VALUE OF SCIENTIFIC INFORMATION TO ACCELERATE INNOVATION
PDF
Building an accurate understanding of consumers based on real-world signals
PDF
Care Intervention Assistant - Omaha Clinical Data Information System
PDF
Correspondent Banking Networks
PDF
Delivering Large Scale Real-time Graph Analytics with Dell Infrastructure and...
PDF
Fraud Detection and Compliance with Graph Learning
PDF
Fraudulent credit card cash-out detection On Graphs
PDF
FROM DATAFRAMES TO GRAPH Data Science with pyTigerGraph
PDF
Customer Experience Management
PDF
Davraz - A graph visualization and exploration software.
PDF
TigerGraph.js
PDF
GRAPHS FOR THE FUTURE ENERGY SYSTEMS
PDF
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
PDF
Recommendation Engine with In-Database Machine Learning
PDF
Supply Chain and Logistics Management with Graph & AI
PDF
The key to creating a Golden Thread: the power of Graph Databases for Entity ...
PDF
Training Graph Convolutional Neural Networks in Graph Database
PDF
Fraud prevention is better with TigerGraph inside
PDF
Deep Link Analytics Empowered by AI + Graph + Verticals
PDF
Graph + AI World Opening Keynote
MAXIMIZING THE VALUE OF SCIENTIFIC INFORMATION TO ACCELERATE INNOVATION
Building an accurate understanding of consumers based on real-world signals
Care Intervention Assistant - Omaha Clinical Data Information System
Correspondent Banking Networks
Delivering Large Scale Real-time Graph Analytics with Dell Infrastructure and...
Fraud Detection and Compliance with Graph Learning
Fraudulent credit card cash-out detection On Graphs
FROM DATAFRAMES TO GRAPH Data Science with pyTigerGraph
Customer Experience Management
Davraz - A graph visualization and exploration software.
TigerGraph.js
GRAPHS FOR THE FUTURE ENERGY SYSTEMS
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Recommendation Engine with In-Database Machine Learning
Supply Chain and Logistics Management with Graph & AI
The key to creating a Golden Thread: the power of Graph Databases for Entity ...
Training Graph Convolutional Neural Networks in Graph Database
Fraud prevention is better with TigerGraph inside
Deep Link Analytics Empowered by AI + Graph + Verticals
Graph + AI World Opening Keynote

Recently uploaded (20)

PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Introduction to machine learning and Linear Models
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Business Analytics and business intelligence.pdf
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
IB Computer Science - Internal Assessment.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Business Acumen Training GuidePresentation.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Fluorescence-microscope_Botany_detailed content
Introduction-to-Cloud-ComputingFinal.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
1_Introduction to advance data techniques.pptx
Introduction to machine learning and Linear Models
Clinical guidelines as a resource for EBP(1).pdf
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Miokarditis (Inflamasi pada Otot Jantung)
Data_Analytics_and_PowerBI_Presentation.pptx
Business Analytics and business intelligence.pdf
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Galatica Smart Energy Infrastructure Startup Pitch Deck
Acceptance and paychological effects of mandatory extra coach I classes.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf

Graph Gurus Episode 25: Unleash the Business Value of Your Data Lake with Graph Analytics

  • 1. © 20190 TigerGraph. All Rights Reserved Graph Gurus 25 Unleash the Business Value of Your Data Lake with Graph Analytics 1
  • 2. © 2019 TigerGraph. All Rights Reserved Today’s Presenter 2 David Schexnayder Senior Sales Engineer ● 20+ years in tech industry ● 7+ years with Hadoop / Big Data ● Bachelor of Science from North Carolina State University
  • 3. © 2019 TigerGraph. All Rights Reserved Some Housekeeping Items ● Although your phone is muted we do want to answer your questions - submit your questions at any time using the Q&A tab in the menu ● The webinar is being recorded and will uploaded to our website shortly (https://www.tigergraph.com/webinars-and-events/) and the URL will be emailed you ● If you have issues with Zoom please contact the panelists via chat 3
  • 4. © 2019 TigerGraph. All Rights Reserved Our Agenda 4 Introduction1 4 3 2 Data Lakes - Characteristics and Challenges Why Graph DB and Analytics Why TigerGraph 5 How TigerGraph Fits 6 Resources
  • 5. © 2019 TigerGraph. All Rights Reserved Relational Database Key-Value Database Graph Database Customer XXXXXX Product XXXXXXXX Supplier XXXXXXXX Location XXXXXXXX Order XXXXXXXXX Product Customer Supplier Location KEY VALUE XXXXX Order Customer Produc t Supplier • Rigid schema • High performance for transactions • Poor performance for deep analytics • Highly fluid schema/no schema • High performance for simple transactions • Poor performance deep analytics • Flexible schema • High performance for complex transactions • High performance for deep analytics Location 1 = Delivery Location Location 2 = Warehouse Location 2 Product Payment PURCHASED RESIDES SHIPSTO PURCHASED SHIPS FROM A C C EPTED MAKES The Evolution of Databases XXXXX XXXXX XXXXX Location 1 N O TIFIES 5
  • 6. © 2019 TigerGraph. All Rights Reserved Characteristics of Data Lakes Up to Petabytes of data from diverse sources • Non-relational and relational data • Web logs • Social media feeds • Collected data Multiple engines to process and analyze data • Spark • Hive • Impala Data stored in a variety of formats • NoSQL Databases (HBase, Cassandra) • Columnar Storage (Parquet, ORC) • Flat files, sequence files 6
  • 7. © 2019 TigerGraph. All Rights Reserved Common Components of Data Lakes 7
  • 8. © 2019 TigerGraph. All Rights Reserved Challenges of Data Lakes ● Data sprawl and duplication ● Gaining real-time insights very difficult ● Security and data governance ● Table joins are still expensive! 8
  • 9. © 2019 TigerGraph. All Rights Reserved Graph to the Rescue TigerGraph provides the ability for businesses to put the data in your lake to work: • Faster insights into your connected data • Hybrid Transactional and Analytical Processing (HTAP) • Connect multiple data sources expressed in graph • Distributed efficient storage and parallel processing 9
  • 10. © 2019 TigerGraph. All Rights Reserved Graph Can Accelerate Your Use Cases TigerGraph can run optimized use cases to put the data in your lake to work: • Fraud and Anti Money Laundering • Next-generation recommendation engines • Customer journey / Customer 360 • Master Data Management • Entity Resolution • IT Optimization 10
  • 11. © 2019 TigerGraph. All Rights Reserved Blocking Fraud Before Payment Is Made ● 30% increase in fraud detection ● Integrated 20+ data sources, handling 2 billion real-time events per day, 1 trillion edges ● Distributed across 40 standard servers (20 with 2x replication) ● Reduced working cycle from one hour to minutes ● Business logic implemented in 10k of C++ lines 11
  • 12. © 2019 TigerGraph. All Rights Reserved Increase Revenues with Recommendations Person A Person C Shops at Store Y Bought item X online from Store Y Knows/likes/follows B B commented (to C) Person B C (also) shops at store Y Is located at Z Item X is stored at Warehouse W Item X has tag/feature F Customer Visit Click Path Would you like…? B likes hobby H 12
  • 13. © 2019 TigerGraph. All Rights Reserved Building a Customer 360° Data Hub Business Challenge • Combine all available data for customer with transactions to improve business outcomes Solution • Build on top of current investments in master data management, data warehouse/Hadoop data lake and NoSQL repositories • Find new relationships among data to drive better fraud and money laundering detection, credit risk scoring and monitoring, product & service marketing, cross-sell and up-sell recommendation for higher revenue & profits • Analyze temporal (Time Series) and spatial data to find new patterns and insights • Expand schema (attributes/fields, relationships) to accommodate new data sources & use cases 13
  • 14. © 2019 TigerGraph. All Rights Reserved Entity Resolution for MDM with TigerGraph 14
  • 15. © 2019 TigerGraph. All Rights Reserved Metadata Management with Graph ● Metadata is “Data about other Data” ● Types of Metadata ○ Structure - Organizational units, operations, IT infrastructure, stages/tates, data collections, database schema ○ Processes - business, operational, software, data processing ○ Process flow = directional (“Directed”) graph ● Uses ○ Establish and manage policies and processes ○ Manage data lineage, integration and sharing ○ Ensure compliance (financial, GDPR, HIPAA, etc.) ○ Security, access control ○ Identity and eliminate redundancies and inefficiencies 15
  • 16. © 2019 TigerGraph. All Rights Reserved Improved Energy Management System • Business Challenge: Monitor complex energy infrastructure to detect and manage power overloads and outages • Solution: • Model power system using real-time operational Graph to accelerate state estimation & power flow calculation (no data preparation needed) • Leverage massively parallel computing in Graph for bus ordering & admittance graph forming to balance load spikes • Visualize energy computation results in Graph for contingency analysis and action plan • Business Value: Deliver faster than real-time EMS (ie, EMS capable of completing execution within a SCADA sample cycle of 5 seconds) 16
  • 17. © 2019 TigerGraph. All Rights Reserved Delivering Better Outcomes Customers Supplier Employee Device IoT Signals Orders Payments Shipments Invoices Visits Downloads Master Data Operational Data Data Warehouse Data Mart Data Lake NoSQL Historical Data Queries / Lookups, Comprehensive Graph Patterns and Algorithms Graph-Computed Features Batch and Streaming 17
  • 18. © 2019 TigerGraph. All Rights Reserved Hadoop Interoperability Several ingestion tools to integrate Your data lake: ● Spark via JDBC ● HDFS Connector ● S3 Connector ● JDBC from RDBMS 18
  • 19. © 2019 TigerGraph. All Rights Reserved Testing Setup and Configuration • Cloudera Quickstart VM 5.13.0 and example data • TigerGraph Developer Edition 2.5.0 19
  • 20. © 2019 TigerGraph. All Rights Reserved Hadoop Workflow • Use sqoop to pull transactional data from RDBMS • Build Hive tables and store on HDFS as parquet • Query data with Impala, Hive, Spark 20
  • 21. © 2019 TigerGraph. All Rights Reserved Spark and TigerGraph Data Pipeline Static Data Sources TigerGraph JDBC Driver Streaming Data Sources
  • 22. © 2019 TigerGraph. All Rights Reserved JDBC Driver (1.2) • Type 4 driver • Support Reads and Writes bi-directional data flow to TigerGraph • Read: Converts ResultSet to DataFrame • Write: Load DataFrame and files to vertex/edge in TigerGraph • Supports REST endpoints of built-in, compiled and interpreted GSQL queries • Open Source: https://github.com/tigergraph/ecosys/tree/master/etl/tg-jdbc-driver 22
  • 23. © 2019 TigerGraph. All Rights Reserved JDBC Driver Installation on Cloudera • Upgrade Java to 1.8.0 openJDK • Convert to parcels • Install Spark 2.4.0 parcel (optional) • Build the TigerGraph JDBC driver (from github) • Install the JDBC driver and add it to ‘classpath.txt’ 23
  • 24. © 2019 TigerGraph. All Rights Reserved Relational Data Schema 24
  • 25. © 2019 TigerGraph. All Rights Reserved Graph Data Schema 25
  • 26. © 2019 TigerGraph. All Rights Reserved Hadoop —> TigerGraph Workflow • Use Apache Spark to read the data on HDFS in parquet format • Use the TigerGraph Spark JDBC connector to load the data into the graph • Explore the graph 26
  • 27. © 2019 TigerGraph. All Rights Reserved Demo! 27
  • 28. © 2019 TigerGraph. All Rights Reserved What Makes TigerGraph Fast and Scalable? Designed from ground up for computational parallelism • Native graph storage • Parallel graph computation engine • Deep link analytics in real time • 10X+ compression • Distributed graph for scaleout using multiple machines 28
  • 29. Q&A Please submit your questions via the Q&A tab in Zoom
  • 30. © 2019 TigerGraph. All Rights Reserved Additional Learning Resources Start Free at TigerGraph Cloud Today! https://www.tigergraph.com/cloud/ Test Drive Online Demo https://www.tigergraph.com/demo Download the Developer Edition https://www.tigergraph.com/download/ Guru Scripts https://github.com/tigergraph/ecosys/tree/master/guru_scripts Join our Developer Forum https://groups.google.com/a/opengsql.org/forum/#!forum/gsql-users 30
  • 31. © 2019 TigerGraph. All Rights Reserved Coming to a City Near You Let us know if you would like to help organize a Graph Gurus Comes To You workshop in your city https://info.tigergraph.com/graph-gurus-request 31
  • 32. © 20190 TigerGraph. All Rights Reserved Thank You