Discussion for Anomaly & Prediction Engine

Copyright 2016 FUJITSU LIMITED
Discussion for Anomaly &
Prediction Engine
04 Feb. 2016
Hisashi Osanai
0

Agenda
 POC Introduction
 POC Demo
 System Configuration
 Parallel distributed processing platform
 Ex. Batch process / Stream process
 Findings/Problems from POC
 Why I’m interested in Monasca …
 Current Concerns and Approach
1

POC Demo
Copyright 2016 FUJITSU LIMITED2

Demo System Configuration
Master server Visualization
server
OS
Elastic
Search
Apache
(httpd)
Kibana
JDK
OS
collection/store
definition
Hadoop
Sparkfluentd
RabbitMQ
Parallel distributed
processing platform
process
definition
Stream
process
SparkStreaming/
SparkSQL
Data
converter
Target
server
#1
OS
fluentd
collection
definition fluentd
collection/store
definition
Slave server
#3
Spark
OS
Hadoop
JDK
JDK
Data collection target
Slave server
#2
Spark
OS
Hadoop
JDK
Slave server
#1
Spark
OS
Hadoop
JDK
Batch
process
Task
controller
Target
server
#2
OS
fluentd
collection
definition
Target server
#n
OS
fluentd
collection
definition
3

Parallel distributed processing platform
Apache Spark(Core)
SparkSQL
(SQL query)
SparkStreaming
(Event stream
processing)
Job
Definition
(XML)
RabbitMQ
(Message
broker)Fluentd
(Data
collector)
HDFS
(Distributed File System )
ElasticSearch
(Real time search engine)
Kibana
(Data
visualization)
Stream
data
reception
Data
process
with SQL
Create
time-series
data
Analysis
process
Ex. “stream data analysis” in the anomaly detection process
 Enable to execute Stream process and Batch process
 Fast-acting data conversion based on XML-based Job
Definition
4

Ex. Batch process
Job definition (XML)
TASK:1
Read “master data”
SparkBatch
Application
TASK:2
Read “Web access log”
Web access log
Analysis
TASK:3
Query and Save
Spark Cluster
HDFS
HDFS
 Analyze a lot of Web access log on file system
5

Ex. Stream process
Job definition (XML)
RabbitMQ
Receiver
RabbitMQ
TASK:1
Process and store
the CPU information
HDFS
Spark
Streaming
Application
TASK:2
Process and store
the MEM information
Analysis
Target server
 Analyze statistics information (CPU/MEM) in real-time
6

Findings/Problems from POC
 Needs manpower for data collection on target servers
 Have discussions with customers to define collecting data and
then configure fluentd agents (Num of POCs is limited)
 Difficult to store experiences of IT analytics
 Data and its format are different each customer so suitable
anomaly detection libraries are also different
 Difficult to catch up for anomaly detection libraries
 Rapid tech evolution for Machine Learning such as Mllib,
TensorFlow, CNTK and so on
7

 Seems to solve two problems from POC
 Needs manpower for data collection on target servers
•Monasca provides agents for OpenStack env so we just use them.
 Difficult to store experiences of IT analytics
•Data come from Monasca agents and the format is stable. So we use
the data as stable input and are looking for “which libraries are
suitable for this env which is monitored by Monasca”
 Add a catching function to Monasca
 Boosts Monasca sales
•A lot of our customers are interested in IT analytics
•Fujitsu sells Monasca-based product 
Why I’m interested in Monasca…
8

 Current Concerns
 Performance for real time anomaly detection (Storm vs.
ApacheStreaming)
 Rapid tech evolution for Machine Learning (Needs to have plugin
arch for the libraries)
 Approach (a base for discussion)
 How to move Anomaly & Prediction Engine (APE) dev ahead?
 Idea
•First Rebase current prototype on Monasca master (If possible, I would
like to do this with Roland’s help)
•Then use it to find out problems 
Current Concerns & Approach
9

Discussion for Anomaly & Prediction Engine

Discussion for Anomaly & Prediction Engine

More Related Content

What's hot (20)

Similar to Discussion for Anomaly & Prediction Engine (20)

Recently uploaded (20)

Discussion for Anomaly & Prediction Engine

Editor's Notes