SlideShare a Scribd company logo
WSO2 Big Data
Platform and
Applications
Srinath Perera
Director, Research, WSO2 Inc.
Visiting Faculty, University of Moratuwa
Member, Apache Software Foundation
Research Scientist, Lanka Software Foundation
WSO2 Big Data Platform and Applications
What can We do with Big Data?
 Optimize (World is inefficient)
o 30% food wasted farm to plate
o GE 1% initiative (http://goo.gl/eYC0QE )
- 1% saving in trains can save 2B/ year
- 1% in US healthcare is 20B/ year
- In contrast, Sri Lanka total exports 9B/ year.
 Save lives
o Weather, Disease identification,
Personalized treatment
 Technology advancement
o Most high tech research are done via
simulations
Big Data Architecture
Big data Processing Technologies
WSO2 Analytics Platform
Big Data Analytics Offering
8
Combined Power
 Users can send
events to both BAM
and CEP via the
same APIs
 CEP can combine
output from batch
Processing and data
from various storage
(e.g. databases) with
real-time processing
o e.g. Implementing Lambda
Architecture
9
Highly Pluggable Architecture
WSO2 CEP
WSO2 BAM
● Powered by Apache Hadoop with management and queries using
Apache Hive
● Parallel, distributed processing based on the MapReduce
programming model
● Runs on local Hadoop node or can be delegated to a cluster of
Hadoop nodes
● Scalable script-based analytics written using an easy-to-learn, SQL-
like query language.
Analyzer
Engine
Hadoop
Cluster
Data Store
(Cassandra/
RDBMS)
1
High Level Languages
 For both batch and real-time, we provide
structured , SQL-like query languages.
o No Java programming is required
 Lowers the adoption entry point
 BAM
o Relies on Apache Hive
 CEP
o Implemented though our own solution, Siddhi.
1
Event table:(Map a database as an event
stream)
Filter: (Process single
transaction)
Windows:(Track a window of events)
CEP Operators with Siddhi
 define stream RequestStream ( correlationID string, serviceID
string,userID string, tear string, requestTime long, ... ) ;
 define table BlacklistedUserTable(userID string,time long,requestCount
long);
 from RequestStream[tear==‘BRONZE’]#window.time(1 min)
 select userID, requestTime as time, count(correlationID) as
requestCount
 group by userID
 having up requestCount > 5
 insert into BlacklistedUserTable ;
1
Smart Home
 DEBS (Distributed Event Based Systems) is a
premier academic conference, which post
yearly event processing challenge
(http://www.cse.iitb.ac.in/debs2014/?page_id=
42)
 Smart Home electricity data: 2000 sensors, 40
houses, 4 Billion events
 We posted fastest single node solution
measured (400K events/sec) and close to one
million distributed throughput.
 WSO2 CEP based solution is one of the four
finalists (with Dresden University of
Technology, Fraunhofer Institute, and Imperial
College London)
 Only generic solution to become a finalist
1
Healthcare Data Monitoring
 Allows to search/visualize/analyze healthcare
records (HL7) across 20 hospitals in Italy
 Used in combination with WSO2 ESB and BAM
 Custom toolbox tailored to customer’s requirement
( to replace existing system)

1
Cloud IDE Analytics
 Custom solution created in partnership
with Codenvy to bring analytics to
Codenvy management team and its
customers
 Developed in less than a month, with a
custom plug-in to MongoDB.
 Deployed in the codenvy.com platform.
1
Watch at:
https://www.youtube.com/watch?v=nRI6buQ0NOM
Case Study: Realtime Soccer Analysis
1
Additional Customers Use Cases
 Used in Healthcare, Parking Monitoring (see Solution patterns based
approach to rapidly create IoE solutions across industries,
o http://us14.wso2con.com/videos/#Coumara-Radja
 Used by a Large Scale IoT System Provider for use cases including Vehicle
tracking, Smart City, Building Monitoring (CEP)
o See “Internet of Big Things: The Story of Pacific Controls,
http://us14.wso2con.com/videos/#Sajaad-Chaudry”
 Transaction Monitoring in a Large Bank (CEP)
 Knowledge Mining and tracking Prospective Customers through Natural
Language data sources (CEP)
 CEP Embedded in edge Devices
o See WSO2Con 2013 - Keynote:Emerging Foundations of Next-
Generation Business Systems
https://www.youtube.com/watch?v=7CyG3JKUxWw
 Throttling and Anomaly Detection by Group of Telecom Companies
1
Extensions and Toolboxes
 Fraud and Anomaly Detection Toolbox - ( Static Rules, Statistical
outliers, Markov Chains)
 Time Series Toolbox
 Natural Language Processing Plugin (Entity Extraction, POS tagging,
Sentiment analysis)
 GIS Toolbox (Geo Fencing, Tracking, Speed Alarms)
 Running machine learning models exported as PMML with CEP (e.g.
from R)
 Video Monitoring with OpenCV
 For more info, http://wso2.com/library/articles/2014/08/wso2-cep-in-
action-an-analysis-of-use-in-real-world-applications-of-different-
domains/
2
Geo Fencing and Tracking Toolbox
2
SolidCon Demo -
http://wso2.com/library/articles/
2014/09/demonstration-on-
architecture-of-internet-of-
things-an-analysis/
IoT Demos and Use Cases
 IOT Reference Architecture,
http://wso2.com/landing/internet-of-
things-uk-2014/
 Internet of Big Things: The Story of
Pacific Controls,
http://us14.wso2con.com/videos/#Saj
aad-Chaudry
 Federated Identity for IoT with
OAuth,
http://www.infoq.com/presentations/f
ederated-identity-IoT-OAuth
2
Analyzing sentiments for
FIFA twitter hashtag
Sentimental Analysis Demo
Work in Progress
2
Predictive Analytics
2
Leveraging Apache Storm in CEP
2
BAM Enhancements
 Work underway to Switch to Apache
Spark and Shark SQL like Queries
support in BAM
o Faster Queries
o Keeping SQL like language
 Use “Hive on Spark” for migration
purposes
 Lower the adoption point of BAM by
packaging by default an RDBMS instead
of Cassandra.
o Architecture already scales from small
deployments to BigData
Questions?
2
Business Model

More Related Content

PPT
Introduction to Large Scale Data Analysis with WSO2 Analytics Platform
PPTX
Introduction to WSO2 Analytics Platform: 2016 Q2 Update
PPTX
Introduction to WSO2 Data Analytics Platform
PPTX
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
PDF
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
PPTX
Big Data Analysis : Deciphering the haystack
PPTX
Solving DEBS Grand Challenge with WSO2 CEP
PDF
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Introduction to Large Scale Data Analysis with WSO2 Analytics Platform
Introduction to WSO2 Analytics Platform: 2016 Q2 Update
Introduction to WSO2 Data Analytics Platform
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
Big Data Analysis : Deciphering the haystack
Solving DEBS Grand Challenge with WSO2 CEP
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...

What's hot (20)

PDF
Sensing the world with data of things
PDF
AI-Powered Streaming Analytics for Real-Time Customer Experience
PDF
Spark Summit - Stratio Streaming
PDF
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
PDF
Realtime Data Analysis Patterns
PDF
Credit Fraud Prevention with Spark and Graph Analysis
PPTX
IEEE Cloud 2012: Clouds Hands-On Tutorial
PDF
Visualising and Linking Open Data from Multiple Sources
PDF
Modern real-time streaming architectures
PDF
Streaming computing: architectures, and tchnologies
PPT
Complex Event Processing: What?, Why?, How?
PDF
Blue Pill/Red Pill: The Matrix of Thousands of Data Streams
PDF
High-Performance Advanced Analytics with Spark-Alchemy
PPTX
7 Predictive Analytics, Spark , Streaming use cases
PDF
Big data serving: Processing and inference at scale in real time
PDF
Critical Breakthroughs and Challenges in Big Data and Analytics
PDF
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
PPTX
Using druid for interactive count distinct queries at scale @ nmc
PPTX
Using druid for interactive count distinct queries at scale
PDF
Project
Sensing the world with data of things
AI-Powered Streaming Analytics for Real-Time Customer Experience
Spark Summit - Stratio Streaming
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Realtime Data Analysis Patterns
Credit Fraud Prevention with Spark and Graph Analysis
IEEE Cloud 2012: Clouds Hands-On Tutorial
Visualising and Linking Open Data from Multiple Sources
Modern real-time streaming architectures
Streaming computing: architectures, and tchnologies
Complex Event Processing: What?, Why?, How?
Blue Pill/Red Pill: The Matrix of Thousands of Data Streams
High-Performance Advanced Analytics with Spark-Alchemy
7 Predictive Analytics, Spark , Streaming use cases
Big data serving: Processing and inference at scale in real time
Critical Breakthroughs and Challenges in Big Data and Analytics
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Using druid for interactive count distinct queries at scale @ nmc
Using druid for interactive count distinct queries at scale
Project
Ad

Viewers also liked (12)

PPTX
API Strategies for Big Data - If Data Were Oil
PPTX
Developing Big Data Strategy
PDF
FIWARE Internet of Things
PPTX
Process Maker Features
PDF
API and Big Data Solution Patterns
PPTX
Crime Analytics: Analysis of crimes through news paper articles
PPT
RWDG Webinar: The New Non-Invasive Data Governance Framework
PDF
Data strategy in a Big Data world
PDF
How to Build a Rock-Solid Analytics and Business Intelligence Strategy
PDF
8 Steps to Creating a Data Strategy
PPTX
Big Data Analytics Strategy and Roadmap
PDF
Data Analytics Strategy
API Strategies for Big Data - If Data Were Oil
Developing Big Data Strategy
FIWARE Internet of Things
Process Maker Features
API and Big Data Solution Patterns
Crime Analytics: Analysis of crimes through news paper articles
RWDG Webinar: The New Non-Invasive Data Governance Framework
Data strategy in a Big Data world
How to Build a Rock-Solid Analytics and Business Intelligence Strategy
8 Steps to Creating a Data Strategy
Big Data Analytics Strategy and Roadmap
Data Analytics Strategy
Ad

Similar to WSO2 Big Data Platform and Applications (20)

PDF
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
PDF
WSO2 Machine Learner - Product Overview
ODP
Applying Drools in Assistive Technology
PPTX
IoT Week 2021_Jens Hagemeyer presentation
PDF
WSO2 Data Analytics Server - Product Overview
PPT
Access Control in ESDIN: Shibboleth
PPTX
SSG4Env EGU2010
PPTX
Presentation On Advance Monitoring of Cold chain truck
PPTX
Session 33 - Production Grids
PDF
Self-Tuning Data Centers
PPTX
Industrial Pioneers Days - Machine Learning
PPTX
IoTSuite: A Framework to Design, Implement, and Deploy IoT Applications
PDF
IEEE SusTech IoT Keynote Presentation 10/10/16
PDF
Fiware overview3
PPTX
From measurement to knowledge with sofia2 Platform
PPTX
Ogce Workflow Suite
PDF
20130503 iCore at calipso workshop fia dublin
PDF
Building a reliable and scalable IoT platform with MongoDB and HiveMQ
PPT
Shibboleth Federations and Secure SDI
PPT
OGC Web Service Shibboleth Interoperability Experiment
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
WSO2 Machine Learner - Product Overview
Applying Drools in Assistive Technology
IoT Week 2021_Jens Hagemeyer presentation
WSO2 Data Analytics Server - Product Overview
Access Control in ESDIN: Shibboleth
SSG4Env EGU2010
Presentation On Advance Monitoring of Cold chain truck
Session 33 - Production Grids
Self-Tuning Data Centers
Industrial Pioneers Days - Machine Learning
IoTSuite: A Framework to Design, Implement, and Deploy IoT Applications
IEEE SusTech IoT Keynote Presentation 10/10/16
Fiware overview3
From measurement to knowledge with sofia2 Platform
Ogce Workflow Suite
20130503 iCore at calipso workshop fia dublin
Building a reliable and scalable IoT platform with MongoDB and HiveMQ
Shibboleth Federations and Secure SDI
OGC Web Service Shibboleth Interoperability Experiment

More from Srinath Perera (20)

PDF
Book: Software Architecture and Decision-Making
PDF
Data science Applications in the Enterprise
PDF
An Introduction to APIs
PDF
An Introduction to Blockchain for Finance Professionals
PDF
AI in the Real World: Challenges, and Risks and how to handle them?
PDF
Healthcare + AI: Use cases & Challenges
PDF
How would AI shape Future Integrations?
PDF
The Role of Blockchain in Future Integrations
PDF
Future of Serverless
PDF
Blockchain: Where are we? Where are we going?
PDF
Few thoughts about Future of Blockchain
PDF
A Visual Canvas for Judging New Technologies
PDF
Privacy in Bigdata Era
PDF
Blockchain, Impact, Challenges, and Risks
PPTX
Today's Technology and Emerging Technology Landscape
PDF
An Emerging Technologies Timeline
PDF
The Rise of Streaming SQL and Evolution of Streaming Applications
PDF
Analytics and AI: The Good, the Bad and the Ugly
PDF
Transforming a Business Through Analytics
PDF
SoC Keynote:The State of the Art in Integration Technology
Book: Software Architecture and Decision-Making
Data science Applications in the Enterprise
An Introduction to APIs
An Introduction to Blockchain for Finance Professionals
AI in the Real World: Challenges, and Risks and how to handle them?
Healthcare + AI: Use cases & Challenges
How would AI shape Future Integrations?
The Role of Blockchain in Future Integrations
Future of Serverless
Blockchain: Where are we? Where are we going?
Few thoughts about Future of Blockchain
A Visual Canvas for Judging New Technologies
Privacy in Bigdata Era
Blockchain, Impact, Challenges, and Risks
Today's Technology and Emerging Technology Landscape
An Emerging Technologies Timeline
The Rise of Streaming SQL and Evolution of Streaming Applications
Analytics and AI: The Good, the Bad and the Ugly
Transforming a Business Through Analytics
SoC Keynote:The State of the Art in Integration Technology

Recently uploaded (20)

PPTX
IB Computer Science - Internal Assessment.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Database Infoormation System (DBIS).pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
[EN] Industrial Machine Downtime Prediction
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Leprosy and NLEP programme community medicine
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
IB Computer Science - Internal Assessment.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
SAP 2 completion done . PRESENTATION.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Data_Analytics_and_PowerBI_Presentation.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Database Infoormation System (DBIS).pptx
Clinical guidelines as a resource for EBP(1).pdf
Miokarditis (Inflamasi pada Otot Jantung)
IBA_Chapter_11_Slides_Final_Accessible.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
[EN] Industrial Machine Downtime Prediction
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
STERILIZATION AND DISINFECTION-1.ppthhhbx
Introduction to Knowledge Engineering Part 1
Leprosy and NLEP programme community medicine
STUDY DESIGN details- Lt Col Maksud (21).pptx
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...

WSO2 Big Data Platform and Applications

  • 1. WSO2 Big Data Platform and Applications Srinath Perera Director, Research, WSO2 Inc. Visiting Faculty, University of Moratuwa Member, Apache Software Foundation Research Scientist, Lanka Software Foundation
  • 3. What can We do with Big Data?  Optimize (World is inefficient) o 30% food wasted farm to plate o GE 1% initiative (http://goo.gl/eYC0QE ) - 1% saving in trains can save 2B/ year - 1% in US healthcare is 20B/ year - In contrast, Sri Lanka total exports 9B/ year.  Save lives o Weather, Disease identification, Personalized treatment  Technology advancement o Most high tech research are done via simulations
  • 5. Big data Processing Technologies
  • 8. 8 Combined Power  Users can send events to both BAM and CEP via the same APIs  CEP can combine output from batch Processing and data from various storage (e.g. databases) with real-time processing o e.g. Implementing Lambda Architecture
  • 11. WSO2 BAM ● Powered by Apache Hadoop with management and queries using Apache Hive ● Parallel, distributed processing based on the MapReduce programming model ● Runs on local Hadoop node or can be delegated to a cluster of Hadoop nodes ● Scalable script-based analytics written using an easy-to-learn, SQL- like query language. Analyzer Engine Hadoop Cluster Data Store (Cassandra/ RDBMS)
  • 12. 1 High Level Languages  For both batch and real-time, we provide structured , SQL-like query languages. o No Java programming is required  Lowers the adoption entry point  BAM o Relies on Apache Hive  CEP o Implemented though our own solution, Siddhi.
  • 13. 1 Event table:(Map a database as an event stream) Filter: (Process single transaction) Windows:(Track a window of events) CEP Operators with Siddhi  define stream RequestStream ( correlationID string, serviceID string,userID string, tear string, requestTime long, ... ) ;  define table BlacklistedUserTable(userID string,time long,requestCount long);  from RequestStream[tear==‘BRONZE’]#window.time(1 min)  select userID, requestTime as time, count(correlationID) as requestCount  group by userID  having up requestCount > 5  insert into BlacklistedUserTable ;
  • 14. 1 Smart Home  DEBS (Distributed Event Based Systems) is a premier academic conference, which post yearly event processing challenge (http://www.cse.iitb.ac.in/debs2014/?page_id= 42)  Smart Home electricity data: 2000 sensors, 40 houses, 4 Billion events  We posted fastest single node solution measured (400K events/sec) and close to one million distributed throughput.  WSO2 CEP based solution is one of the four finalists (with Dresden University of Technology, Fraunhofer Institute, and Imperial College London)  Only generic solution to become a finalist
  • 15. 1 Healthcare Data Monitoring  Allows to search/visualize/analyze healthcare records (HL7) across 20 hospitals in Italy  Used in combination with WSO2 ESB and BAM  Custom toolbox tailored to customer’s requirement ( to replace existing system) 
  • 16. 1 Cloud IDE Analytics  Custom solution created in partnership with Codenvy to bring analytics to Codenvy management team and its customers  Developed in less than a month, with a custom plug-in to MongoDB.  Deployed in the codenvy.com platform.
  • 18. 1 Additional Customers Use Cases  Used in Healthcare, Parking Monitoring (see Solution patterns based approach to rapidly create IoE solutions across industries, o http://us14.wso2con.com/videos/#Coumara-Radja  Used by a Large Scale IoT System Provider for use cases including Vehicle tracking, Smart City, Building Monitoring (CEP) o See “Internet of Big Things: The Story of Pacific Controls, http://us14.wso2con.com/videos/#Sajaad-Chaudry”  Transaction Monitoring in a Large Bank (CEP)  Knowledge Mining and tracking Prospective Customers through Natural Language data sources (CEP)  CEP Embedded in edge Devices o See WSO2Con 2013 - Keynote:Emerging Foundations of Next- Generation Business Systems https://www.youtube.com/watch?v=7CyG3JKUxWw  Throttling and Anomaly Detection by Group of Telecom Companies
  • 19. 1 Extensions and Toolboxes  Fraud and Anomaly Detection Toolbox - ( Static Rules, Statistical outliers, Markov Chains)  Time Series Toolbox  Natural Language Processing Plugin (Entity Extraction, POS tagging, Sentiment analysis)  GIS Toolbox (Geo Fencing, Tracking, Speed Alarms)  Running machine learning models exported as PMML with CEP (e.g. from R)  Video Monitoring with OpenCV  For more info, http://wso2.com/library/articles/2014/08/wso2-cep-in- action-an-analysis-of-use-in-real-world-applications-of-different- domains/
  • 20. 2 Geo Fencing and Tracking Toolbox
  • 21. 2 SolidCon Demo - http://wso2.com/library/articles/ 2014/09/demonstration-on- architecture-of-internet-of- things-an-analysis/ IoT Demos and Use Cases  IOT Reference Architecture, http://wso2.com/landing/internet-of- things-uk-2014/  Internet of Big Things: The Story of Pacific Controls, http://us14.wso2con.com/videos/#Saj aad-Chaudry  Federated Identity for IoT with OAuth, http://www.infoq.com/presentations/f ederated-identity-IoT-OAuth
  • 22. 2 Analyzing sentiments for FIFA twitter hashtag Sentimental Analysis Demo
  • 26. 2 BAM Enhancements  Work underway to Switch to Apache Spark and Shark SQL like Queries support in BAM o Faster Queries o Keeping SQL like language  Use “Hive on Spark” for migration purposes  Lower the adoption point of BAM by packaging by default an RDBMS instead of Cassandra. o Architecture already scales from small deployments to BigData