SlideShare a Scribd company logo
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Oracle Big Data Spatial & Graph

Social Media Analysis - Case Study
Mark Rittman, CTO, Rittman Mead
BIWA Summit 2016, San Francisco, January 2016
info@rittmanmead.com www.rittmanmead.com @rittmanmead 2
•Oracle Gold Partner with offices in the UK and USA (Atlanta)

•70+ staff delivering Oracle BI, DW, Big Data and Advanced Analytics projects

•Oracle ACE Director (Mark Rittman, CTO) + 2 Oracle ACEs

•Significant web presence with the Rittman Mead Blog (http://www.rittmanmead.com)

•Regular sers of social media 

(Facebook, Twitter, Slideshare etc)

•Regular column in Oracle Magazine 

and other publications

•Hadoop R&D lab for “dogfooding” 

solutions developed for customers
About Rittman Mead
info@rittmanmead.com www.rittmanmead.com @rittmanmead 3
Business Scenario
•Rittman Mead want to understand drivers and audience for their website
‣What is our most popular content? Who are the most in-demand blog authors?
‣Who are the influencers? What communities exist around our web presence?
•Three data sources in scope:
RM Website Logs Twitter Stream Website Posts, Comments etc
info@rittmanmead.com www.rittmanmead.com @rittmanmead X
•Initial iteration of project focused on capturing and ingesting web + social media activity

•Apache Flume used for capturing website hits, page views

•Twitter Streaming API used to capture tweets referring to RM website or RM staff

•Activity landed into Hadoop (HDFS), processed and enriched and presented using Hive
Overall Project Architecture - Phase 1
info@rittmanmead.com www.rittmanmead.com @rittmanmead X
•Provided real-time counts of page views, correlated with Twitter activity stored in Hive tables

•Accessed using Oracle Big Data SQL +

joined to Oracle RDBMS reference data

•Delivered using OBIEE reports and dashboards

•Data Warehousing, but cheaper + real-time

•Answered questions such as

‣What are our most popular site pages?

‣Which pages attracted the most

attention on Twitter, Facebook?

‣What topics are popular?
Real-Time Metrics around Site Activity - “What?”
Combine with Oracle Big Data SQL
for structured OBIEE dashboard analysis
What pages are people visiting?
Who is referring to us on Twitter?
What content has the most reach?
info@rittmanmead.com www.rittmanmead.com @rittmanmead X
•Oracle Big Data Discovery used to go back to the raw event data add more meaning

•Enrich data, extract nouns + terms, add reference data from file, RDBMS etc

•Understand sentiment + meaning of tweets, link disparate + loosely coupled events

•Faceted search dashboards
Oracle BDD for Data Wrangling + Data Enrichment
info@rittmanmead.com www.rittmanmead.com @rittmanmead 4
OBIEE and BDD for the “What” and “Why” Questions…
•Counts of page views, tweets, mentions etc helped us understand what content was popular
•Analysis of tweet sentiment, meaning and correlation with content answered why
Combine with Oracle Big Data SQL
for structured OBIEE dashboard analysis
Combine with site content, semantics, text enrichment
Catalog and explore using Oracle Big Data Discovery
What pages are people visiting?
Who is referring to us on Twitter?
What content has the most reach?
Why is some content more popular?
Does sentiment affect viewership?
What content is popular, where?
info@rittmanmead.com www.rittmanmead.com @rittmanmead 5
•Previous counts assumed that all tweet references equally important

•But some Twitter users are far more influential than others

‣Sit at the centre of a community, have 1000’s of followers

‣A reference by them has massive impact on page views

‣Positive or negative comments from them drive perception

•Can we identify them?

‣Potentially “reach out” with analyst program

‣Study what website posts go “viral”

‣Understand out audience, and the conversation, better
But Who Are The Influencers In Our Community?
Influencer	Identification
Communication	
Stream	(e.g.	tweets)
Find	out	people	that	are	
central in	the	given	
network	– e.g.	influencer	
marketing
info@rittmanmead.com www.rittmanmead.com @rittmanmead 6
•Rittman Mead website features many types of content

‣Blogs on BI, data integration, big data, data warehousing

‣Op-Eds (“OBIEE12c - Three Months In, What’s the Verdict?”)

‣Articles on a theme, e.g. performance tuning

‣Details of new courses, new promotions

•Different communities likely to form around these content types

•Different influencers and patterns of recommendation, discovery

•Can we identify some of the communities, segment our audience?
What Communities and Networks Are Our Audience?
Community	Detection
Identify	group	of	people	
that	are	close	to	each	other	
– e.g.	target	group	
marketing
info@rittmanmead.com www.rittmanmead.com @rittmanmead X
Tabular (SQL) Query Tools Aimed at Counts + Aggs
info@rittmanmead.com www.rittmanmead.com @rittmanmead 7
•Finance

‣Fraud detection, cross marketing

•Telecommunications

‣Call records analysis

•Retail

‣Recommendation, sentiment analysis

•Social

‣Network analytics, influencers, clustering

•Health Care

‣Doctor, patient, diagnosis, treatment analysis
Property Graph Usage Scenarios
info@rittmanmead.com www.rittmanmead.com @rittmanmead 8
Common Big Data Graph Analysis Use-Cases
Purchase	Record
customer items
Product	Recommendation Influencer	Identification
Communication	
Stream	(e.g.	tweets)
Graph	Pattern	MatchingCommunity	Detection
Recommend	the	most	
similar item	purchased	by	
similar people
Find	out	people	that	are	
central in	the	given	
network	– e.g.	influencer	
marketing
Identify	group	of	people	
that	are	close	to	each	other	
– e.g.	target	group	
marketing
Find	out	all	the	sets	of	
entities	that	match	to	the	
given	pattern	– e.g.	fraud	
detection
10
info@rittmanmead.com www.rittmanmead.com @rittmanmead 9
Graph Example : RM Blog Post Referenced on Twitter
Lifting the Lid on OBIEE Internals with 

Linux Diagnostics Tools http://t.co/gFcUPOm5pI
00 0 0 Page Views10 0 0 Page Views
Follows
20 0 0 Page Views
Follows
30 0 0 Page Views
info@rittmanmead.com www.rittmanmead.com @rittmanmead 10
Network Effect Magnified by Extent of Social Graph
Lifting the Lid on OBIEE Internals with 

Linux Diagnostics Tools http://t.co/gFcUPOm5pI
30 0 0 Page Views70 0 5 Page Views
Lifting the Lid on OBIEE Internals with 

Linux Diagnostics Tools http://t.co/gFcUPOm5pI
info@rittmanmead.com www.rittmanmead.com @rittmanmead 11
Retweets by Influential Twitter Users Drive Visits
Lifting the Lid on OBIEE Internals with 

Linux Diagnostics Tools http://t.co/gFcUPOm5pI
30 0 0 Page Views
Retweet
50 0 3 Page ViewsRT: Lifting the Lid on OBIEE Internals with 

Linux Diagnostics Tools http://t.co/gFcUPOm5pI
info@rittmanmead.com www.rittmanmead.com @rittmanmead 12
Retweets, Mentions and Replies Create Communities
Retweet
Reply
Mention
Reply
#bigdatasql
Reply
Mention
Mention
Mention
Mention
#thatswhatshesaid
info@rittmanmead.com www.rittmanmead.com @rittmanmead X
Property Graph Terminology
Lifting the Lid on OBIEE Internals with 

Linux Diagnostics Tools http://t.co/gFcUPOm5pI
Mentions
Node, or “Vertex”
Node, or “Vertex”
Directed Connection, or “Edge”
Edge Type
Vertex Properties
info@rittmanmead.com www.rittmanmead.com @rittmanmead 13
Property Graph Terminology
Lifting the Lid on OBIEE Internals with 

Linux Diagnostics Tools http://t.co/gFcUPOm5pI
Mentions
Lifting the Lid on OBIEE Internals with 

Linux Diagnostics Tools http://t.co/gFcUPOm5pI
Retweets
Node, or “Vertex”
Directed Connection, or “Edge”
Node, or “Vertex”
info@rittmanmead.com www.rittmanmead.com @rittmanmead 14
•Different types of Twitter interaction could imply more or less “influence”

‣Retweet of another user’s Tweet 

implies that person is worth quoting

or you endorse their opinion

‣Reply to another user’s tweet 

could be a weaker recognition of 

that person’s opinion or view

‣Mention of a user in a tweet is a 

weaker recognition that they are 

part of a community / debate
Determining Influencers - Factors to Consider
info@rittmanmead.com www.rittmanmead.com @rittmanmead 15
Relative Importance of Edge Types Added via Weights
Lifting the Lid on OBIEE Internals with 

Linux Diagnostics Tools http://t.co/gFcUPOm5pI
Mentions, Weight = 30
Lifting the Lid on OBIEE Internals with 

Linux Diagnostics Tools http://t.co/gFcUPOm5pI
Retweet, Weight = 100
Edge Property
Edge Property
info@rittmanmead.com www.rittmanmead.com @rittmanmead X
•Graph, spatial and raster data processing for big data

‣Primarily documented + tested against Oracle BDA

‣Installable on commodity cluster using CDH

•Data stored in Apache HBase or Oracle NoSQL DB

‣Complements Spatial & Graph in Oracle Database

‣Designed for trillions of nodes, edges etc

•Out-of-the-box spatial enrichment services

•Over 35 of most popular graph analysis functions

‣Graph traversal, recommendations

‣Finding communities and influencers, 

‣Pattern matching
Oracle Big Data Spatial & Graph
info@rittmanmead.com www.rittmanmead.com @rittmanmead
•Data loaded from files or through Java API into HBase 

•In-Memory Analytics layer runs common graph and spatial algorithms on data

•Visualised using R or other

graphics packaged
Oracle Big Data Graph and Spatial Architecture
Massively Scalable Graph Store
• Oracle NoSQL
• HBase
Lightning-Fast In-Memory Analytics
• YARN Container
• Standalone Server
• Embedded
info@rittmanmead.com www.rittmanmead.com @rittmanmead 16
•ODI12c used to prepare two files in Oracle Flat File Format

‣Extracted vertices and edges from existing data in Hive

‣Wrote vertices (Twitter users) to .opv file, 

edges (RTs, replies etc) to .ope file

•For exercise, only considered 2-3 days of tweets

‣Did not include follows (user A followed user B)

as not reported by Twitter Streaming API

‣Could approximate larger follower networks through

multiplying weight of edge by follower scale

-Useful for Page Rank, but does it skew 

actual detection of influencers in exercise?
Preparing Vertices and Edges for Ingestion
info@rittmanmead.com www.rittmanmead.com @rittmanmead 17
Oracle Flat File Format Vertices and Edge Files
• Unique ID for the vertex
• Property name (“name”)
• Property value datatype (1 = String)
• Property value (“markrittman”)
Vertex File (.opv)
• Unique ID for the edge
• Leading edge vertex ID
• Trailing edge vertex ID
• Edge Type (“mentions”)
• Edge Property (“weight”)
• Edge Property datatype and value
Edge File (.ope)
info@rittmanmead.com www.rittmanmead.com @rittmanmead 18
cfg = GraphConfigBuilder.forPropertyGraphHbase() 
.setName("connectionsHBase") 
.setZkQuorum("bigdatalite").setZkClientPort(2181) 
.setZkSessionTimeout(120000).setInitialEdgeNumRegions(3) 
.setInitialVertexNumRegions(3).setSplitsPerRegion(1) 
.addEdgeProperty("weight", PropertyType.DOUBLE, "1000000") 
.build();
opg = OraclePropertyGraph.getInstance(cfg);
opg.clearRepository();
vfile="../../data/biwa_connections.opv"
efile="../../data/biwa_connections.ope"
opgdl=OraclePropertyGraphDataLoader.getInstance();
opgdl.loadData(opg, vfile, efile, 2);
// read through the vertices
opg.getVertices();
// read through the edges
opg.getEdges();
Loading Edges and Vertices into HBase
Uses “Gremlin” Shell for HBase
• Creates connection to HBase
• Sets initial configuration for database
• Builds the database ready for load
• Defines location of Vertex and Edge files
• Creates instance of 

OraclePropertyGraphDataLoader
• Loads data from files
• Prepares the property graph for use
• Loads in Edges and Vertices
• Now ready for in-memory processing
info@rittmanmead.com www.rittmanmead.com @rittmanmead 19
Calculating Most Influential Tweeters Using Page Rank
vOutput="/tmp/mygraph.opv"
eOutput="/tmp/mygraph.ope"
OraclePropertyGraphUtils.exportFlatFiles(opg, vOutput, eOutput, 2,
false);
session = Pgx.createSession("session-id-1");
analyst = session.createAnalyst();
graph = session.readGraphWithProperties(opg.getConfig());
rank = analyst.pagerank(graph, 0.001, 0.85, 100);
rank.getTopKValues(5);
==>PgxVertex with ID 1=0.13885623487462861
==>PgxVertex with ID 3=0.08686102641801993
==>PgxVertex with ID 101=0.06757752513733056
==>PgxVertex with ID 6=0.06743774001139484
==>PgxVertex with ID 37=0.0481517609757462
==>PgxVertex with ID 17=0.042234536894569276
==>PgxVertex with ID 29=0.04109794527311113
==>PgxVertex with ID 65=0.032058649698044187
==>PgxVertex with ID 15=0.023075360575195276
==>PgxVertex with ID 93=0.019265959946506813
• Initiates an in-memory analytics session
• Runs Page Rank algorithm to determine influencers
• Outputs top ten vertices (users)
Top 10 vertices
info@rittmanmead.com www.rittmanmead.com @rittmanmead 20
Calculating Most Influential Tweeters Using Page Rank
v1=opg.getVertex(1l); v2=opg.getVertex(3l); v3=opg.getVertex(101l); 
v4=opg.getVertex(6l); v5=opg.getVertex(37l); v6=opg.getVertex(17l); 
v7=opg.getVertex(29l); v8=opg.getVertex(65l); v9=opg.getVertex(15l); 
v10=opg.getVertex(93l);
System.out.println("Top 10 influencers: n " + v1.getProperty("name") + 
"n " + v2.getProperty("name") + 
"n " + v3.getProperty("name") + 
"n " + v4.getProperty("name") + 
"n " + v5.getProperty("name") + 
"n " + v6.getProperty("name") + 
"n " + v7.getProperty("name") + 
"n " + v8.getProperty("name") + 
"n " + v9.getProperty("name") + 
"n " + v10.getProperty("name"));
Top 10 influencers:
markrittman
rmoff
rittmanmead
mRainey
JeromeFr
Nephentur
borkur
BIExperte
i_m_dave
dw_pete
Note :
Over a 3-day period in May 2015
Twitter users referencing RM website + staff accounts
info@rittmanmead.com www.rittmanmead.com @rittmanmead 21
•Open source graph analysis tool with Oracle
Big Data Graph and Spatial Plug-in

•Available shortly from Oracle, connects to
Oracle NoSQL or HBase and runs Page
Rank etc

•Alternative to command-line for In-Memory
Analytics once base graph created
Visualising Property Graphs with Cytoscape
info@rittmanmead.com www.rittmanmead.com @rittmanmead 22
Calculating Top 10 Users using Page Rank Algorithm
Top 10 influencers:
markrittman
rmoff
rittmanmead
mRainey
JeromeFr
Nephentur
borkur
BIExperte
i_m_dave
dw_pete
info@rittmanmead.com www.rittmanmead.com @rittmanmead 23
Visualising the Social Graph Around Particular Users
info@rittmanmead.com www.rittmanmead.com @rittmanmead 24
Calculating Shortest Path Between Users
info@rittmanmead.com www.rittmanmead.com @rittmanmead 25
Edge Bundling to Better Illustrate Connection Frequency
info@rittmanmead.com www.rittmanmead.com @rittmanmead 26
Determining Communities via Twitter Interactions
info@rittmanmead.com www.rittmanmead.com @rittmanmead 27
Determining Communities via Twitter Interactions
• Clusters based on actual interaction
patterns, not hashtags
• Detects real communities, not ones
that exist just in-theory
info@rittmanmead.com www.rittmanmead.com @rittmanmead 28
Conclusions, and Further Reading
•Tools such as OBIEE are great for understanding what (counts, page views, popular items)
•Oracle Big Data Discovery can be useful for understanding “why?” (sentiment, terms etc)
•Graph Analysis can help answer “who”?
•Who are our audience? What are our communities? Who are their important influencers?
•Oracle Big Data Graph and Spatial can answer these questions to “big data” scale
•Articles on the Rittman Mead Blog
‣http://www.rittmanmead.com/category/oracle-big-data-appliance/
‣http://www.rittmanmead.com/category/big-data/
‣http://www.rittmanmead.com/category/oracle-big-data-discovery/
•Rittman Mead offer consulting, training and managed services for Oracle Big Data
‣http://www.rittmanmead.com/bigdata
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Oracle Big Data Spatial & Graph

Social Media Analysis - Case Study
Mark Rittman, CTO, Rittman Mead
BIWA Summit 2016, San Francisco, January 2016

More Related Content

PPTX
Unlock the value in your big data reservoir using oracle big data discovery a...
PDF
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
PDF
What is Big Data Discovery, and how it complements traditional business anal...
PDF
Oracle big data spatial and graph
PDF
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
PDF
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...
PDF
OTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle Cloud
PDF
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
Unlock the value in your big data reservoir using oracle big data discovery a...
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
What is Big Data Discovery, and how it complements traditional business anal...
Oracle big data spatial and graph
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...
OTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle Cloud
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...

What's hot (20)

PDF
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...
PDF
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
PDF
Deploying Full BI Platforms to Oracle Cloud
PDF
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...
PDF
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
PDF
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
PDF
Using Oracle Big Data Discovey as a Data Scientist's Toolkit
PDF
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
PDF
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
PDF
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
PDF
OTN EMEA TOUR 2016 - OBIEE12c New Features for End-Users, Developers and Sys...
PPTX
Dataware house Introduction By Quontra Solutions
PDF
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...
PDF
ODI12c as your Big Data Integration Hub
PDF
How a Tweet Went Viral - BIWA Summit 2017
PDF
Turn Data Into Actionable Insights - StampedeCon 2016
PDF
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
PDF
Big Data for Managers: From hadoop to streaming and beyond
PDF
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODI
PDF
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
Deploying Full BI Platforms to Oracle Cloud
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Using Oracle Big Data Discovey as a Data Scientist's Toolkit
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
OTN EMEA TOUR 2016 - OBIEE12c New Features for End-Users, Developers and Sys...
Dataware house Introduction By Quontra Solutions
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...
ODI12c as your Big Data Integration Hub
How a Tweet Went Viral - BIWA Summit 2017
Turn Data Into Actionable Insights - StampedeCon 2016
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Big Data for Managers: From hadoop to streaming and beyond
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODI
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
Ad

Viewers also liked (20)

PDF
Data modeling for Elasticsearch
PDF
Social media with big data analytics
PDF
Social network analysis & Big Data - Telecommunications and more
PDF
Big Data Analytics : A Social Network Approach
PPTX
Social Media Analytics Demystified
PDF
Analytics for Social Media
PDF
Big data and Social Media Analytics
PPT
Big data ppt
PPTX
What is Big Data?
PPTX
A Different Perspective on Business with Social Data
PPTX
Telecom Data Analysis Using Social Media Feeds
PPTX
Social Media Analysis of Political Parties for Delhi Assembly Election 2015
PDF
Analysing the digital traces of Social Media users
PPTX
Social Media in Australia: The Case of Twitter
PDF
Social networks, activities, and travel - building links to understand behaviour
PPSX
Multimedia Data Collection using Social Media Analysis
PPT
Spatio-temporal demographic classification of the Twitter users
PPTX
Friendship and mobility user movement in location based social networks
PPTX
Statistical analytical programming for social media analysis .
PPTX
A guide to realistic social media and measurement
Data modeling for Elasticsearch
Social media with big data analytics
Social network analysis & Big Data - Telecommunications and more
Big Data Analytics : A Social Network Approach
Social Media Analytics Demystified
Analytics for Social Media
Big data and Social Media Analytics
Big data ppt
What is Big Data?
A Different Perspective on Business with Social Data
Telecom Data Analysis Using Social Media Feeds
Social Media Analysis of Political Parties for Delhi Assembly Election 2015
Analysing the digital traces of Social Media users
Social Media in Australia: The Case of Twitter
Social networks, activities, and travel - building links to understand behaviour
Multimedia Data Collection using Social Media Analysis
Spatio-temporal demographic classification of the Twitter users
Friendship and mobility user movement in location based social networks
Statistical analytical programming for social media analysis .
A guide to realistic social media and measurement
Ad

Similar to Oracle Big Data Spatial & Graph 
Social Media Analysis - Case Study (20)

PDF
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
PPT
Building the Inform Semantic Publishing Ecosystem: from Author to Audience
PDF
The Maturity Model: Taking the Growing Pains Out of Hadoop
PPTX
Semantics and Machine Learning
PPTX
Architecting for Big Data: Trends, Tips, and Deployment Options
PDF
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
PDF
Open Data Summit Presentation by Joe Olsen
PDF
Continuum Analytics and Python
PDF
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
PPTX
Business Intelligence and Big Data in Cloud
PDF
Let's analyze how world reacts to road traffic by sentiment analysis final
PDF
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)
PDF
Big Data in Action – Real-World Solution Showcase
PDF
Sourcing with Social Media: Tips from a Corporate Sleuth by Sean Campbell
PDF
Analytical Innovation: How to Build the Next Generation Data Platform
PPTX
Accelerating Data Lakes and Streams with Real-time Analytics
PDF
A6 big data_in_the_cloud
PPTX
Designing Big Content - Search Exchange 2013
PDF
Webinar Structured Data
PDF
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
Building the Inform Semantic Publishing Ecosystem: from Author to Audience
The Maturity Model: Taking the Growing Pains Out of Hadoop
Semantics and Machine Learning
Architecting for Big Data: Trends, Tips, and Deployment Options
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Open Data Summit Presentation by Joe Olsen
Continuum Analytics and Python
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Business Intelligence and Big Data in Cloud
Let's analyze how world reacts to road traffic by sentiment analysis final
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)
Big Data in Action – Real-World Solution Showcase
Sourcing with Social Media: Tips from a Corporate Sleuth by Sean Campbell
Analytical Innovation: How to Build the Next Generation Data Platform
Accelerating Data Lakes and Streams with Real-time Analytics
A6 big data_in_the_cloud
Designing Big Content - Search Exchange 2013
Webinar Structured Data

More from Mark Rittman (9)

PDF
The Future of Analytics, Data Integration and BI on Big Data Platforms
PDF
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
PDF
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015
PDF
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015
PDF
OGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI Projects
PDF
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12c
PDF
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
PDF
Part 4 - Hadoop Data Output and Reporting using OBIEE11g
PDF
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
The Future of Analytics, Data Integration and BI on Big Data Platforms
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015
OGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI Projects
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12c
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
Part 4 - Hadoop Data Output and Reporting using OBIEE11g
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12c

Recently uploaded (20)

PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Lecture1 pattern recognition............
PDF
.pdf is not working space design for the following data for the following dat...
PPT
Quality review (1)_presentation of this 21
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Database Infoormation System (DBIS).pptx
PDF
Foundation of Data Science unit number two notes
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Launch Your Data Science Career in Kochi – 2025
Acceptance and paychological effects of mandatory extra coach I classes.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Lecture1 pattern recognition............
.pdf is not working space design for the following data for the following dat...
Quality review (1)_presentation of this 21
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
IB Computer Science - Internal Assessment.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Fluorescence-microscope_Botany_detailed content
Business Acumen Training GuidePresentation.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Database Infoormation System (DBIS).pptx
Foundation of Data Science unit number two notes
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Launch Your Data Science Career in Kochi – 2025

Oracle Big Data Spatial & Graph 
Social Media Analysis - Case Study

  • 1. info@rittmanmead.com www.rittmanmead.com @rittmanmead Oracle Big Data Spatial & Graph
 Social Media Analysis - Case Study Mark Rittman, CTO, Rittman Mead BIWA Summit 2016, San Francisco, January 2016
  • 2. info@rittmanmead.com www.rittmanmead.com @rittmanmead 2 •Oracle Gold Partner with offices in the UK and USA (Atlanta) •70+ staff delivering Oracle BI, DW, Big Data and Advanced Analytics projects •Oracle ACE Director (Mark Rittman, CTO) + 2 Oracle ACEs •Significant web presence with the Rittman Mead Blog (http://www.rittmanmead.com) •Regular sers of social media 
 (Facebook, Twitter, Slideshare etc) •Regular column in Oracle Magazine 
 and other publications •Hadoop R&D lab for “dogfooding” 
 solutions developed for customers About Rittman Mead
  • 3. info@rittmanmead.com www.rittmanmead.com @rittmanmead 3 Business Scenario •Rittman Mead want to understand drivers and audience for their website ‣What is our most popular content? Who are the most in-demand blog authors? ‣Who are the influencers? What communities exist around our web presence? •Three data sources in scope: RM Website Logs Twitter Stream Website Posts, Comments etc
  • 4. info@rittmanmead.com www.rittmanmead.com @rittmanmead X •Initial iteration of project focused on capturing and ingesting web + social media activity •Apache Flume used for capturing website hits, page views •Twitter Streaming API used to capture tweets referring to RM website or RM staff •Activity landed into Hadoop (HDFS), processed and enriched and presented using Hive Overall Project Architecture - Phase 1
  • 5. info@rittmanmead.com www.rittmanmead.com @rittmanmead X •Provided real-time counts of page views, correlated with Twitter activity stored in Hive tables •Accessed using Oracle Big Data SQL +
 joined to Oracle RDBMS reference data •Delivered using OBIEE reports and dashboards •Data Warehousing, but cheaper + real-time •Answered questions such as ‣What are our most popular site pages? ‣Which pages attracted the most
 attention on Twitter, Facebook? ‣What topics are popular? Real-Time Metrics around Site Activity - “What?” Combine with Oracle Big Data SQL for structured OBIEE dashboard analysis What pages are people visiting? Who is referring to us on Twitter? What content has the most reach?
  • 6. info@rittmanmead.com www.rittmanmead.com @rittmanmead X •Oracle Big Data Discovery used to go back to the raw event data add more meaning •Enrich data, extract nouns + terms, add reference data from file, RDBMS etc •Understand sentiment + meaning of tweets, link disparate + loosely coupled events •Faceted search dashboards Oracle BDD for Data Wrangling + Data Enrichment
  • 7. info@rittmanmead.com www.rittmanmead.com @rittmanmead 4 OBIEE and BDD for the “What” and “Why” Questions… •Counts of page views, tweets, mentions etc helped us understand what content was popular •Analysis of tweet sentiment, meaning and correlation with content answered why Combine with Oracle Big Data SQL for structured OBIEE dashboard analysis Combine with site content, semantics, text enrichment Catalog and explore using Oracle Big Data Discovery What pages are people visiting? Who is referring to us on Twitter? What content has the most reach? Why is some content more popular? Does sentiment affect viewership? What content is popular, where?
  • 8. info@rittmanmead.com www.rittmanmead.com @rittmanmead 5 •Previous counts assumed that all tweet references equally important •But some Twitter users are far more influential than others ‣Sit at the centre of a community, have 1000’s of followers ‣A reference by them has massive impact on page views ‣Positive or negative comments from them drive perception •Can we identify them? ‣Potentially “reach out” with analyst program ‣Study what website posts go “viral” ‣Understand out audience, and the conversation, better But Who Are The Influencers In Our Community? Influencer Identification Communication Stream (e.g. tweets) Find out people that are central in the given network – e.g. influencer marketing
  • 9. info@rittmanmead.com www.rittmanmead.com @rittmanmead 6 •Rittman Mead website features many types of content ‣Blogs on BI, data integration, big data, data warehousing ‣Op-Eds (“OBIEE12c - Three Months In, What’s the Verdict?”) ‣Articles on a theme, e.g. performance tuning ‣Details of new courses, new promotions •Different communities likely to form around these content types •Different influencers and patterns of recommendation, discovery •Can we identify some of the communities, segment our audience? What Communities and Networks Are Our Audience? Community Detection Identify group of people that are close to each other – e.g. target group marketing
  • 10. info@rittmanmead.com www.rittmanmead.com @rittmanmead X Tabular (SQL) Query Tools Aimed at Counts + Aggs
  • 11. info@rittmanmead.com www.rittmanmead.com @rittmanmead 7 •Finance ‣Fraud detection, cross marketing •Telecommunications ‣Call records analysis •Retail ‣Recommendation, sentiment analysis •Social ‣Network analytics, influencers, clustering •Health Care ‣Doctor, patient, diagnosis, treatment analysis Property Graph Usage Scenarios
  • 12. info@rittmanmead.com www.rittmanmead.com @rittmanmead 8 Common Big Data Graph Analysis Use-Cases Purchase Record customer items Product Recommendation Influencer Identification Communication Stream (e.g. tweets) Graph Pattern MatchingCommunity Detection Recommend the most similar item purchased by similar people Find out people that are central in the given network – e.g. influencer marketing Identify group of people that are close to each other – e.g. target group marketing Find out all the sets of entities that match to the given pattern – e.g. fraud detection 10
  • 13. info@rittmanmead.com www.rittmanmead.com @rittmanmead 9 Graph Example : RM Blog Post Referenced on Twitter Lifting the Lid on OBIEE Internals with 
 Linux Diagnostics Tools http://t.co/gFcUPOm5pI 00 0 0 Page Views10 0 0 Page Views Follows 20 0 0 Page Views Follows 30 0 0 Page Views
  • 14. info@rittmanmead.com www.rittmanmead.com @rittmanmead 10 Network Effect Magnified by Extent of Social Graph Lifting the Lid on OBIEE Internals with 
 Linux Diagnostics Tools http://t.co/gFcUPOm5pI 30 0 0 Page Views70 0 5 Page Views Lifting the Lid on OBIEE Internals with 
 Linux Diagnostics Tools http://t.co/gFcUPOm5pI
  • 15. info@rittmanmead.com www.rittmanmead.com @rittmanmead 11 Retweets by Influential Twitter Users Drive Visits Lifting the Lid on OBIEE Internals with 
 Linux Diagnostics Tools http://t.co/gFcUPOm5pI 30 0 0 Page Views Retweet 50 0 3 Page ViewsRT: Lifting the Lid on OBIEE Internals with 
 Linux Diagnostics Tools http://t.co/gFcUPOm5pI
  • 16. info@rittmanmead.com www.rittmanmead.com @rittmanmead 12 Retweets, Mentions and Replies Create Communities Retweet Reply Mention Reply #bigdatasql Reply Mention Mention Mention Mention #thatswhatshesaid
  • 17. info@rittmanmead.com www.rittmanmead.com @rittmanmead X Property Graph Terminology Lifting the Lid on OBIEE Internals with 
 Linux Diagnostics Tools http://t.co/gFcUPOm5pI Mentions Node, or “Vertex” Node, or “Vertex” Directed Connection, or “Edge” Edge Type Vertex Properties
  • 18. info@rittmanmead.com www.rittmanmead.com @rittmanmead 13 Property Graph Terminology Lifting the Lid on OBIEE Internals with 
 Linux Diagnostics Tools http://t.co/gFcUPOm5pI Mentions Lifting the Lid on OBIEE Internals with 
 Linux Diagnostics Tools http://t.co/gFcUPOm5pI Retweets Node, or “Vertex” Directed Connection, or “Edge” Node, or “Vertex”
  • 19. info@rittmanmead.com www.rittmanmead.com @rittmanmead 14 •Different types of Twitter interaction could imply more or less “influence”
 ‣Retweet of another user’s Tweet 
 implies that person is worth quoting
 or you endorse their opinion
 ‣Reply to another user’s tweet 
 could be a weaker recognition of 
 that person’s opinion or view
 ‣Mention of a user in a tweet is a 
 weaker recognition that they are 
 part of a community / debate Determining Influencers - Factors to Consider
  • 20. info@rittmanmead.com www.rittmanmead.com @rittmanmead 15 Relative Importance of Edge Types Added via Weights Lifting the Lid on OBIEE Internals with 
 Linux Diagnostics Tools http://t.co/gFcUPOm5pI Mentions, Weight = 30 Lifting the Lid on OBIEE Internals with 
 Linux Diagnostics Tools http://t.co/gFcUPOm5pI Retweet, Weight = 100 Edge Property Edge Property
  • 21. info@rittmanmead.com www.rittmanmead.com @rittmanmead X •Graph, spatial and raster data processing for big data ‣Primarily documented + tested against Oracle BDA ‣Installable on commodity cluster using CDH •Data stored in Apache HBase or Oracle NoSQL DB ‣Complements Spatial & Graph in Oracle Database ‣Designed for trillions of nodes, edges etc •Out-of-the-box spatial enrichment services •Over 35 of most popular graph analysis functions ‣Graph traversal, recommendations ‣Finding communities and influencers, ‣Pattern matching Oracle Big Data Spatial & Graph
  • 22. info@rittmanmead.com www.rittmanmead.com @rittmanmead •Data loaded from files or through Java API into HBase •In-Memory Analytics layer runs common graph and spatial algorithms on data •Visualised using R or other
 graphics packaged Oracle Big Data Graph and Spatial Architecture Massively Scalable Graph Store • Oracle NoSQL • HBase Lightning-Fast In-Memory Analytics • YARN Container • Standalone Server • Embedded
  • 23. info@rittmanmead.com www.rittmanmead.com @rittmanmead 16 •ODI12c used to prepare two files in Oracle Flat File Format ‣Extracted vertices and edges from existing data in Hive ‣Wrote vertices (Twitter users) to .opv file, 
 edges (RTs, replies etc) to .ope file •For exercise, only considered 2-3 days of tweets ‣Did not include follows (user A followed user B)
 as not reported by Twitter Streaming API ‣Could approximate larger follower networks through
 multiplying weight of edge by follower scale -Useful for Page Rank, but does it skew 
 actual detection of influencers in exercise? Preparing Vertices and Edges for Ingestion
  • 24. info@rittmanmead.com www.rittmanmead.com @rittmanmead 17 Oracle Flat File Format Vertices and Edge Files • Unique ID for the vertex • Property name (“name”) • Property value datatype (1 = String) • Property value (“markrittman”) Vertex File (.opv) • Unique ID for the edge • Leading edge vertex ID • Trailing edge vertex ID • Edge Type (“mentions”) • Edge Property (“weight”) • Edge Property datatype and value Edge File (.ope)
  • 25. info@rittmanmead.com www.rittmanmead.com @rittmanmead 18 cfg = GraphConfigBuilder.forPropertyGraphHbase() .setName("connectionsHBase") .setZkQuorum("bigdatalite").setZkClientPort(2181) .setZkSessionTimeout(120000).setInitialEdgeNumRegions(3) .setInitialVertexNumRegions(3).setSplitsPerRegion(1) .addEdgeProperty("weight", PropertyType.DOUBLE, "1000000") .build(); opg = OraclePropertyGraph.getInstance(cfg); opg.clearRepository(); vfile="../../data/biwa_connections.opv" efile="../../data/biwa_connections.ope" opgdl=OraclePropertyGraphDataLoader.getInstance(); opgdl.loadData(opg, vfile, efile, 2); // read through the vertices opg.getVertices(); // read through the edges opg.getEdges(); Loading Edges and Vertices into HBase Uses “Gremlin” Shell for HBase • Creates connection to HBase • Sets initial configuration for database • Builds the database ready for load • Defines location of Vertex and Edge files • Creates instance of 
 OraclePropertyGraphDataLoader • Loads data from files • Prepares the property graph for use • Loads in Edges and Vertices • Now ready for in-memory processing
  • 26. info@rittmanmead.com www.rittmanmead.com @rittmanmead 19 Calculating Most Influential Tweeters Using Page Rank vOutput="/tmp/mygraph.opv" eOutput="/tmp/mygraph.ope" OraclePropertyGraphUtils.exportFlatFiles(opg, vOutput, eOutput, 2, false); session = Pgx.createSession("session-id-1"); analyst = session.createAnalyst(); graph = session.readGraphWithProperties(opg.getConfig()); rank = analyst.pagerank(graph, 0.001, 0.85, 100); rank.getTopKValues(5); ==>PgxVertex with ID 1=0.13885623487462861 ==>PgxVertex with ID 3=0.08686102641801993 ==>PgxVertex with ID 101=0.06757752513733056 ==>PgxVertex with ID 6=0.06743774001139484 ==>PgxVertex with ID 37=0.0481517609757462 ==>PgxVertex with ID 17=0.042234536894569276 ==>PgxVertex with ID 29=0.04109794527311113 ==>PgxVertex with ID 65=0.032058649698044187 ==>PgxVertex with ID 15=0.023075360575195276 ==>PgxVertex with ID 93=0.019265959946506813 • Initiates an in-memory analytics session • Runs Page Rank algorithm to determine influencers • Outputs top ten vertices (users) Top 10 vertices
  • 27. info@rittmanmead.com www.rittmanmead.com @rittmanmead 20 Calculating Most Influential Tweeters Using Page Rank v1=opg.getVertex(1l); v2=opg.getVertex(3l); v3=opg.getVertex(101l); v4=opg.getVertex(6l); v5=opg.getVertex(37l); v6=opg.getVertex(17l); v7=opg.getVertex(29l); v8=opg.getVertex(65l); v9=opg.getVertex(15l); v10=opg.getVertex(93l); System.out.println("Top 10 influencers: n " + v1.getProperty("name") + "n " + v2.getProperty("name") + "n " + v3.getProperty("name") + "n " + v4.getProperty("name") + "n " + v5.getProperty("name") + "n " + v6.getProperty("name") + "n " + v7.getProperty("name") + "n " + v8.getProperty("name") + "n " + v9.getProperty("name") + "n " + v10.getProperty("name")); Top 10 influencers: markrittman rmoff rittmanmead mRainey JeromeFr Nephentur borkur BIExperte i_m_dave dw_pete Note : Over a 3-day period in May 2015 Twitter users referencing RM website + staff accounts
  • 28. info@rittmanmead.com www.rittmanmead.com @rittmanmead 21 •Open source graph analysis tool with Oracle Big Data Graph and Spatial Plug-in •Available shortly from Oracle, connects to Oracle NoSQL or HBase and runs Page Rank etc •Alternative to command-line for In-Memory Analytics once base graph created Visualising Property Graphs with Cytoscape
  • 29. info@rittmanmead.com www.rittmanmead.com @rittmanmead 22 Calculating Top 10 Users using Page Rank Algorithm Top 10 influencers: markrittman rmoff rittmanmead mRainey JeromeFr Nephentur borkur BIExperte i_m_dave dw_pete
  • 30. info@rittmanmead.com www.rittmanmead.com @rittmanmead 23 Visualising the Social Graph Around Particular Users
  • 31. info@rittmanmead.com www.rittmanmead.com @rittmanmead 24 Calculating Shortest Path Between Users
  • 32. info@rittmanmead.com www.rittmanmead.com @rittmanmead 25 Edge Bundling to Better Illustrate Connection Frequency
  • 33. info@rittmanmead.com www.rittmanmead.com @rittmanmead 26 Determining Communities via Twitter Interactions
  • 34. info@rittmanmead.com www.rittmanmead.com @rittmanmead 27 Determining Communities via Twitter Interactions • Clusters based on actual interaction patterns, not hashtags • Detects real communities, not ones that exist just in-theory
  • 35. info@rittmanmead.com www.rittmanmead.com @rittmanmead 28 Conclusions, and Further Reading •Tools such as OBIEE are great for understanding what (counts, page views, popular items) •Oracle Big Data Discovery can be useful for understanding “why?” (sentiment, terms etc) •Graph Analysis can help answer “who”? •Who are our audience? What are our communities? Who are their important influencers? •Oracle Big Data Graph and Spatial can answer these questions to “big data” scale •Articles on the Rittman Mead Blog ‣http://www.rittmanmead.com/category/oracle-big-data-appliance/ ‣http://www.rittmanmead.com/category/big-data/ ‣http://www.rittmanmead.com/category/oracle-big-data-discovery/ •Rittman Mead offer consulting, training and managed services for Oracle Big Data ‣http://www.rittmanmead.com/bigdata
  • 36. info@rittmanmead.com www.rittmanmead.com @rittmanmead Oracle Big Data Spatial & Graph
 Social Media Analysis - Case Study Mark Rittman, CTO, Rittman Mead BIWA Summit 2016, San Francisco, January 2016