Copyright 2016 FUJITSU LIMITED
Discussion for Anomaly &
Prediction Engine
04 Feb. 2016
Hisashi Osanai
0
Agenda
Copyright 2016 FUJITSU LIMITED
 POC Introduction
 POC Demo
 System Configuration
 Parallel distributed processing platform
 Ex. Batch process / Stream process
 Findings/Problems from POC
 Why I’m interested in Monasca …
 Current Concerns and Approach
1
POC Demo
Copyright 2016 FUJITSU LIMITED2
Copyright 2016 FUJITSU LIMITED
Demo System Configuration
Master server Visualization
server
OS
Elastic
Search
Apache
(httpd)
Kibana
JDK
OS
collection/store
definition
Hadoop
Sparkfluentd
RabbitMQ
Parallel distributed
processing platform
process
definition
Stream
process
SparkStreaming/
SparkSQL
Data
converter
Target
server
#1
OS
fluentd
collection
definition fluentd
collection/store
definition
Slave server
#3
Spark
OS
Hadoop
JDK
JDK
Data collection target
Slave server
#2
Spark
OS
Hadoop
JDK
Slave server
#1
Spark
OS
Hadoop
JDK
Batch
process
Task
controller
Target
server
#2
OS
fluentd
collection
definition
Target server
#n
OS
fluentd
collection
definition
3
Copyright 2016 FUJITSU LIMITED
Parallel distributed processing platform
Apache Spark(Core)
SparkSQL
(SQL query)
SparkStreaming
(Event stream
processing)
Parallel distributed processing platform
Job
Definition
(XML)
RabbitMQ
(Message
broker)Fluentd
(Data
collector)
HDFS
(Distributed File System )
ElasticSearch
(Real time search engine)
Kibana
(Data
visualization)
Stream
data
reception
Data
process
with SQL
Create
time-series
data
Analysis
process
Ex. “stream data analysis” in the anomaly detection process
 Enable to execute Stream process and Batch process
 Fast-acting data conversion based on XML-based Job
Definition
4
Copyright 2016 FUJITSU LIMITED
Ex. Batch process
Parallel distributed processing platform
Job definition (XML)
TASK:1
Read “master data”
SparkBatch
Application
TASK:2
Read “Web access log”
Web access log
Analysis
TASK:3
Query and Save
Spark Cluster
HDFS
HDFS
 Analyze a lot of Web access log on file system
5
Copyright 2016 FUJITSU LIMITED
Ex. Stream process
Parallel distributed processing platform
Job definition (XML)
RabbitMQ
Receiver
RabbitMQ
TASK:1
Process and store
the CPU information
HDFS
Spark
Streaming
Application
TASK:2
Process and store
the MEM information
Analysis
Target server
 Analyze statistics information (CPU/MEM) in real-time
6
Copyright 2016 FUJITSU LIMITED
Findings/Problems from POC
 Needs manpower for data collection on target servers
 Have discussions with customers to define collecting data and
then configure fluentd agents (Num of POCs is limited)
 Difficult to store experiences of IT analytics
 Data and its format are different each customer so suitable
anomaly detection libraries are also different
 Difficult to catch up for anomaly detection libraries
 Rapid tech evolution for Machine Learning such as Mllib,
TensorFlow, CNTK and so on
7
Copyright 2016 FUJITSU LIMITED
 Seems to solve two problems from POC
 Needs manpower for data collection on target servers
•Monasca provides agents for OpenStack env so we just use them.
 Difficult to store experiences of IT analytics
•Data come from Monasca agents and the format is stable. So we use
the data as stable input and are looking for “which libraries are
suitable for this env which is monitored by Monasca”
 Add a catching function to Monasca
 Boosts Monasca sales
•A lot of our customers are interested in IT analytics
•Fujitsu sells Monasca-based product 
Why I’m interested in Monasca…
8
Copyright 2016 FUJITSU LIMITED
 Current Concerns
 Performance for real time anomaly detection (Storm vs.
ApacheStreaming)
 Rapid tech evolution for Machine Learning (Needs to have plugin
arch for the libraries)
 Approach (a base for discussion)
 How to move Anomaly & Prediction Engine (APE) dev ahead?
 Idea
•First Rebase current prototype on Monasca master (If possible, I would
like to do this with Roland’s help)
•Then use it to find out problems 
Current Concerns & Approach
9
Discussion for Anomaly & Prediction Engine

More Related Content

PPTX
Membase Meetup 2010
PPTX
Apache Spark and Online Analytics
PDF
Presto
PPTX
Getting started with SparkSQL - Desert Code Camp 2016
PDF
Efficient State Management With Spark 2.0 And Scale-Out Databases
PDF
Presto at Hadoop Summit 2016
PDF
Spark Summit EU talk by Emlyn Whittick
PDF
Apache Spark Usage in the Open Source Ecosystem
Membase Meetup 2010
Apache Spark and Online Analytics
Presto
Getting started with SparkSQL - Desert Code Camp 2016
Efficient State Management With Spark 2.0 And Scale-Out Databases
Presto at Hadoop Summit 2016
Spark Summit EU talk by Emlyn Whittick
Apache Spark Usage in the Open Source Ecosystem

What's hot (20)

PDF
The SparkSQL things you maybe confuse
PDF
Sydney Spark Meetup - September 2015
PDF
Sydney Apache Spark Meetup - Spark Natural Language Processing
PPTX
What's New in Spark 2?
PPTX
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
PDF
Distributed ML in Apache Spark
PDF
Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark
PPTX
Presto: SQL-on-anything
PPTX
Open Source Big Data Ingestion - Without the Heartburn!
ODP
Presto
PDF
Presto - Analytical Database. Overview and use cases.
PDF
Javantura v4 - Getting started with Apache Spark - Dinko Srkoč
PPTX
Presto Meetup 2016 Small Start
PDF
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
PDF
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...
PDF
TensorFlowOnSpark Enhanced: Scala, Pipelines, and Beyond with Lee Yang and An...
PDF
Presto @ Facebook: Past, Present and Future
PDF
Big Telco - Yousun Jeong
PDF
Apache Arrow -- Cross-language development platform for in-memory data
PDF
A Journey into Databricks' Pipelines: Journey and Lessons Learned
The SparkSQL things you maybe confuse
Sydney Spark Meetup - September 2015
Sydney Apache Spark Meetup - Spark Natural Language Processing
What's New in Spark 2?
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Distributed ML in Apache Spark
Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark
Presto: SQL-on-anything
Open Source Big Data Ingestion - Without the Heartburn!
Presto
Presto - Analytical Database. Overview and use cases.
Javantura v4 - Getting started with Apache Spark - Dinko Srkoč
Presto Meetup 2016 Small Start
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...
TensorFlowOnSpark Enhanced: Scala, Pipelines, and Beyond with Lee Yang and An...
Presto @ Facebook: Past, Present and Future
Big Telco - Yousun Jeong
Apache Arrow -- Cross-language development platform for in-memory data
A Journey into Databricks' Pipelines: Journey and Lessons Learned
Ad

Similar to Discussion for Anomaly & Prediction Engine (20)

PDF
Accelerating Cyber Threat Detection With GPU
PDF
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
PDF
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
PPTX
Technological forecasting of supercomputer development: The march to exascale...
PDF
Performance Characterization and Optimization of In-Memory Data Analytics on ...
PDF
Leveraging Data Driven Research Through Microsoft Azure
PDF
Experience Big Data Analytics use cases ranging from cancer research to IoT a...
PDF
T22.Fujitsu World Tour India 2016-Business Intelligence and Data Analytics in...
PDF
T21.Fujitsu World Tour India 2016-Education, Research and Design
PDF
Fujitsu World Tour 2017 - Analytics In Digital World
PPTX
Time Series Anomaly Detection with Azure and .NETT
PDF
Data analytics beyond data processing and how it affects Industry 4.0
PDF
Bds session 13 14
PDF
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
PPTX
Delivering Security Insights with Data Analytics and Visualization
PPTX
Dibbs spidal april6-2016
PPTX
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PDF
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
PDF
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
PDF
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Accelerating Cyber Threat Detection With GPU
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
Technological forecasting of supercomputer development: The march to exascale...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Leveraging Data Driven Research Through Microsoft Azure
Experience Big Data Analytics use cases ranging from cancer research to IoT a...
T22.Fujitsu World Tour India 2016-Business Intelligence and Data Analytics in...
T21.Fujitsu World Tour India 2016-Education, Research and Design
Fujitsu World Tour 2017 - Analytics In Digital World
Time Series Anomaly Detection with Azure and .NETT
Data analytics beyond data processing and how it affects Industry 4.0
Bds session 13 14
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
Delivering Security Insights with Data Analytics and Visualization
Dibbs spidal april6-2016
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Ad

Recently uploaded (20)

DOCX
How to Use SharePoint as an ISO-Compliant Document Management System
PDF
How Tridens DevSecOps Ensures Compliance, Security, and Agility
PDF
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
PPTX
Download Adobe Photoshop Crack 2025 Free
PPTX
Trending Python Topics for Data Visualization in 2025
PDF
Visual explanation of Dijkstra's Algorithm using Python
PPTX
Cybersecurity-and-Fraud-Protecting-Your-Digital-Life.pptx
PDF
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
PPTX
Matchmaking for JVMs: How to Pick the Perfect GC Partner
PDF
CCleaner 6.39.11548 Crack 2025 License Key
PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PDF
Autodesk AutoCAD Crack Free Download 2025
PDF
Introduction to Ragic - #1 No Code Tool For Digitalizing Your Business Proces...
PPTX
Airline CRS | Airline CRS Systems | CRS System
PDF
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
PPTX
most interesting chapter in the world ppt
PPTX
Cybersecurity: Protecting the Digital World
PDF
Practical Indispensable Project Management Tips for Delivering Successful Exp...
DOC
UTEP毕业证学历认证,宾夕法尼亚克拉里恩大学毕业证未毕业
PDF
Wondershare Recoverit Full Crack New Version (Latest 2025)
How to Use SharePoint as an ISO-Compliant Document Management System
How Tridens DevSecOps Ensures Compliance, Security, and Agility
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
Download Adobe Photoshop Crack 2025 Free
Trending Python Topics for Data Visualization in 2025
Visual explanation of Dijkstra's Algorithm using Python
Cybersecurity-and-Fraud-Protecting-Your-Digital-Life.pptx
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
Matchmaking for JVMs: How to Pick the Perfect GC Partner
CCleaner 6.39.11548 Crack 2025 License Key
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
Autodesk AutoCAD Crack Free Download 2025
Introduction to Ragic - #1 No Code Tool For Digitalizing Your Business Proces...
Airline CRS | Airline CRS Systems | CRS System
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
most interesting chapter in the world ppt
Cybersecurity: Protecting the Digital World
Practical Indispensable Project Management Tips for Delivering Successful Exp...
UTEP毕业证学历认证,宾夕法尼亚克拉里恩大学毕业证未毕业
Wondershare Recoverit Full Crack New Version (Latest 2025)

Discussion for Anomaly & Prediction Engine

  • 1. Copyright 2016 FUJITSU LIMITED Discussion for Anomaly & Prediction Engine 04 Feb. 2016 Hisashi Osanai 0
  • 2. Agenda Copyright 2016 FUJITSU LIMITED  POC Introduction  POC Demo  System Configuration  Parallel distributed processing platform  Ex. Batch process / Stream process  Findings/Problems from POC  Why I’m interested in Monasca …  Current Concerns and Approach 1
  • 3. POC Demo Copyright 2016 FUJITSU LIMITED2
  • 4. Copyright 2016 FUJITSU LIMITED Demo System Configuration Master server Visualization server OS Elastic Search Apache (httpd) Kibana JDK OS collection/store definition Hadoop Sparkfluentd RabbitMQ Parallel distributed processing platform process definition Stream process SparkStreaming/ SparkSQL Data converter Target server #1 OS fluentd collection definition fluentd collection/store definition Slave server #3 Spark OS Hadoop JDK JDK Data collection target Slave server #2 Spark OS Hadoop JDK Slave server #1 Spark OS Hadoop JDK Batch process Task controller Target server #2 OS fluentd collection definition Target server #n OS fluentd collection definition 3
  • 5. Copyright 2016 FUJITSU LIMITED Parallel distributed processing platform Apache Spark(Core) SparkSQL (SQL query) SparkStreaming (Event stream processing) Parallel distributed processing platform Job Definition (XML) RabbitMQ (Message broker)Fluentd (Data collector) HDFS (Distributed File System ) ElasticSearch (Real time search engine) Kibana (Data visualization) Stream data reception Data process with SQL Create time-series data Analysis process Ex. “stream data analysis” in the anomaly detection process  Enable to execute Stream process and Batch process  Fast-acting data conversion based on XML-based Job Definition 4
  • 6. Copyright 2016 FUJITSU LIMITED Ex. Batch process Parallel distributed processing platform Job definition (XML) TASK:1 Read “master data” SparkBatch Application TASK:2 Read “Web access log” Web access log Analysis TASK:3 Query and Save Spark Cluster HDFS HDFS  Analyze a lot of Web access log on file system 5
  • 7. Copyright 2016 FUJITSU LIMITED Ex. Stream process Parallel distributed processing platform Job definition (XML) RabbitMQ Receiver RabbitMQ TASK:1 Process and store the CPU information HDFS Spark Streaming Application TASK:2 Process and store the MEM information Analysis Target server  Analyze statistics information (CPU/MEM) in real-time 6
  • 8. Copyright 2016 FUJITSU LIMITED Findings/Problems from POC  Needs manpower for data collection on target servers  Have discussions with customers to define collecting data and then configure fluentd agents (Num of POCs is limited)  Difficult to store experiences of IT analytics  Data and its format are different each customer so suitable anomaly detection libraries are also different  Difficult to catch up for anomaly detection libraries  Rapid tech evolution for Machine Learning such as Mllib, TensorFlow, CNTK and so on 7
  • 9. Copyright 2016 FUJITSU LIMITED  Seems to solve two problems from POC  Needs manpower for data collection on target servers •Monasca provides agents for OpenStack env so we just use them.  Difficult to store experiences of IT analytics •Data come from Monasca agents and the format is stable. So we use the data as stable input and are looking for “which libraries are suitable for this env which is monitored by Monasca”  Add a catching function to Monasca  Boosts Monasca sales •A lot of our customers are interested in IT analytics •Fujitsu sells Monasca-based product  Why I’m interested in Monasca… 8
  • 10. Copyright 2016 FUJITSU LIMITED  Current Concerns  Performance for real time anomaly detection (Storm vs. ApacheStreaming)  Rapid tech evolution for Machine Learning (Needs to have plugin arch for the libraries)  Approach (a base for discussion)  How to move Anomaly & Prediction Engine (APE) dev ahead?  Idea •First Rebase current prototype on Monasca master (If possible, I would like to do this with Roland’s help) •Then use it to find out problems  Current Concerns & Approach 9

Editor's Notes