SlideShare a Scribd company logo
Apache
• 来自eBay的分布式实时Hadoop数据安全引擎
蒋吉麟 | 赵晴雯
eBay
Agenda
•About Eagle
•Front End
– Evolution
– Modularization
– Features
•Back End
– Architecture
– Tech Highlights
– Integration
•Q & A
2
3
Apache Eagle is a distributed real-time monitoring and
alerting engine for Hadoop from eBay
Open sourced as Apache Incubator Project on Oct 26th 2015
See http://eagle.incubator.apache.org or http://goeagle.io
Hadoop @eBay
4
1-10 nodes
2007
100+ nodes
1000 + core
1 PB
2010 2011
1000+ node
10,000+ core
10+ PB
4000+ node
40,000+ core
50+ PB
2013
2015
10,000+ nodes
150,000+ cores
150+ PB
2009
10+ nodes
•swf
•exe
5
Features
•common
•metadata
•classification
•metrics
6
common
•Policies
•Alerts
7
metadata
8
classification
•Tree View
•Table View
9
metrics
10
Architecture
11
STREAM PROCESSING
ENGINE
User Profile
based Anomaly
detection
Policy evaluation
based
Framework
Eagle Storage
(Metadata,
metrics,
alerts…
User Profile
training
Eagle Query
DataCollection(Kafka,Yarn
API…)
Had
oop
jmx
DataSink(email,Kakfa…)
Other
Remediation
Systems
…
Tech Highlights
•Data Collection
•Stream Processing DSL
•Distributed Policy Engine
•ML-based anomaly detection
•Query Framework
NOTE {NAME}-{NUMBER} like HDFS-6914 means open source project ticket id contributed by us
12
Apache Eagle – Data Collection
Decoupled with Apache Kafka
• high-throughput distributed messaging
• Easy to inject various kinds of data sources
• Python/Java/C++ Kafka clients
Current data sources support
• Hadoop data
 HDFS, HBase audit log
 GC logs
 JMX metrics
 History/Running MR job data
• …
• Generic format data
13
Apache Eagle – Stream Processing DSL
Easy use
– Easily assemble data transformation, filtering, join…
Flexibility
– Physical execution platform independent
14
STREAM PROCESSING ENGINE
STREAM PROCESSING ENGINE
.flatMap(AuditLogTransformer)
.groupBy(_.user)
.flatMap(UserProfileAggregator);
env.fromKafka (KafkaConfig)
.alert.persistAndEmail
val env = ExecutionEnvironment.getStorm()
env.execute()
Apache Eagle – Stream Processing DSL
15
.flatMap(AuditLogTransformer)
.groupBy(_.user)
.flatMap(UserProfileAggregator);
env.fromKafka (KafkaConfig)
.alert.persistAndEmail
val env = ExecutionEnvironment.getStorm()
env.execute();
Distributed Streaming Cluster Environment
AlertExecutor_{1}
AlertExecutor_{2}
…
AlertExecutor_{N}
Alerts
Real-time
Event Stream
Stream_{1}
Stream_{*}
Stream
Processing
env.execute()
Apache Eagle - Distributed Real-time Policy
Engine
Features
• Extensibility
• Usability
• Real-time
• Scalability
• Metadata-driven
16
METADATA MANAGER
Distributed Streaming Cluster Environment
AlertExecutor_{1}
AlertExecutor_{2}
…
AlertExecutor_{N}
Real Time
Alerts
Alerts
Policy
Management
Policy
Dynamical Policy Deployment
Real-time
Event Stream
Stream_{1}
Stream_{*}
Dynamical Stream Schema
Stream
Processing
Apache Eagle – Distributed Real-time Policy
Engine
17
Distributed Real-time Policy Engine
Siddhi CEP
Policy
Evaluator
Machine
Learning Policy
Evaluator
Extensibility
• Default is WSO2 Siddhi CEP
• Powerful SQL-Like event stream
processing
• Open to other customized policy engine
Extensible Policy
Evaluator
public interface PolicyEvaluatorServiceProvider {
public String getPolicyType(); // literal string to identify one type of policy
public Class<? extends PolicyEvaluator<T>> getPolicyEvaluator(); // get policy evaluator
implementation
public List getBindingModules(); // policy text with json format to object mapping
}
public interface PolicyEvaluator {
public void evaluate(ValuesArray input) throws Exception; // evaluate input event
public void onPolicyUpdate(AlertDefinitionAPIEntity newAlertDef);// policy update
public void onPolicyDelete(); // invoked when policy is deleted
}
METADATA MANAGER
Policy/Metadata
Apache Eagle – Distributed Real-time Policy
Engine
18
METADATA MANAGER
Distributed Streaming Cluster Environment
Real Time
Alerts
Alerts
Policy
Management
Policy
Dynamical Policy Deployment
Usability
• Powerful SQL-Like CEP CQL
for Policy Definition
• Dynamical Policy Lifecycle
Management
(Deployment/Update)
• Easy-to-use Policy
management and Alert
analytics UI
from metricStream[(name == 'ReplLag')
and (value > 1000)] select * insert into
outputStream;
Apache Eagle – Distributed Real-time Policy
Engine
19
Apache Eagle – Distributed Real-time Policy
Engine
20
Real-time
• Stream events are
processed and alerts are
evaluated during
streaming
Distributed Streaming
AlertExecutor_{1}
AlertExecutor_{2}
…
AlertExecutor_{N}
Real Time
Alerts
Alerts
Stream_{1}
Stream_{*}
Stream
Processing
Real-time
Event Stream
Apache Eagle – Distributed Real-time Policy
Engine
21
Metadata-Driven
• Stream Schema:
AlertStreamSchemaEntity
• Policy Definition: AlertDefinitionAPIEntity
@Table("alertdef")
@ColumnFamily("f")
@Prefix("alertdef")
@Service(AlertConstants.ALERT_DEFINITION_SERVICE_ENDPOINT_NAME)
@JsonIgnoreProperties(ignoreUnknown = true)
@TimeSeries(false)
@Tags({"site", "dataSource", "alertExecutorId", "policyId", "policyType"})
@Indexes({
@Index(name="Index_1_alertExecutorId", columns = { "alertExecutorID" }, unique = true),
})
public class AlertDefinitionAPIEntity extends TaggedLogAPIEntity{
@Column("a")
private String desc;
@Column("b")
private String policyDef;
@Column("c")
private String dedupeDef;
METADATA MANAGER
Distributed Real-time Policy Engine
Dynamic Metadata Loading
Apache Eagle – Distributed Real-time Policy
Engine
22
Distributed Streaming Cluster Environment
AlertExecutor_{1}
AlertExecutor_{2}
…
AlertExecutor_{N}
Stream_{1}
Stream_{*}
Stream
Processing
Scalability
• Policy scalability: policy partitioning
• Event scalability: grouping
• Example: N Users with 3 partitions, M policies with 2 partitions, then 3*2 physical tasks
Apache Eagle – Query Framework
23
Query Syntax
• Full-function SQL-Like REST
Query (aggregation, sorting…)
Eagle Storage
• NOSQL storage like HBase
• RDMS
• Other storage systems
Apache Eagle – ML-based Anomaly Detection
24
User Activity Anomaly
Detection
• User profile feature
selection
• Offline user profile
generation
• Online Anomaly
detection
Useful link
• Eagle: User profile-
based anomaly
detection for securing
Hadoop clusters
Apache Eagle – Integration I
25
• Eagle in Apache Ambari
– natively be part of hadoop ecosystem
– http://eagle.incubator.apache.org/docs/ambari-plugin-install.html
• Eagle in Docker
– natively fly on Cloud/Container
– https://github.com/apache/incubator-eagle
Apache Eagle – Integration II
26
•Apache Ranger
– remediation engine
– Eagle data source
•Splunk
– Eagle alert consumer
– EAGLE alert output is the 1st abstraction of analytics and Splunk is the 2nd abstraction
• Dataguise, Apache knox
– Eagle data source
Learn more about Apache Eagle
27
• EAGLE: USER PROFILE-BASED ANOMALY DETECTION IN HADOOP CLUSTER (IEEE)
• EAGLE: DISTRIBUTED REALTIME MONITORING FRAMEWORK FOR HADOOP
CLUSTER
Q&A
28
apache/incubator-eagle
@TheApacheEagle
@ApacheEagle
http://eagle.incubator.apache.org

More Related Content

PPTX
Apache Eagle in Action
PDF
Apache Eagle: eBay构建开源分布式实时预警引擎实践
PPTX
Apache Eagle Dublin Hadoop Summit 2016
PPTX
Apache Eagle Strata Hadoop World London 2016
PPTX
Apache Eagle: Architecture Evolvement and New Features
PPTX
Eagle from eBay at China Hadoop Summit 2015
PDF
Apache Eagle at Hadoop Summit 2016 San Jose
PDF
Spark Summit - Stratio Streaming
Apache Eagle in Action
Apache Eagle: eBay构建开源分布式实时预警引擎实践
Apache Eagle Dublin Hadoop Summit 2016
Apache Eagle Strata Hadoop World London 2016
Apache Eagle: Architecture Evolvement and New Features
Eagle from eBay at China Hadoop Summit 2015
Apache Eagle at Hadoop Summit 2016 San Jose
Spark Summit - Stratio Streaming

What's hot (20)

PPTX
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
PPTX
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
PDF
Building large scale applications in yarn with apache twill
PDF
Productizing Structured Streaming Jobs
PPTX
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
PDF
Comparing Accumulo, Cassandra, and HBase
PPTX
Real Time Data Processing Using Spark Streaming
PDF
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
PDF
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
PDF
Apache storm vs. Spark Streaming
PDF
Apache Storm
PDF
Data automation 101
PDF
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
PDF
Opal: Simple Web Services Wrappers for Scientific Applications
PPTX
Druid Scaling Realtime Analytics
PPTX
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
PDF
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
PDF
Conviva spark
PDF
Harnessing the power of YARN with Apache Twill
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Building large scale applications in yarn with apache twill
Productizing Structured Streaming Jobs
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Comparing Accumulo, Cassandra, and HBase
Real Time Data Processing Using Spark Streaming
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Apache storm vs. Spark Streaming
Apache Storm
Data automation 101
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Opal: Simple Web Services Wrappers for Scientific Applications
Druid Scaling Realtime Analytics
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Conviva spark
Harnessing the power of YARN with Apache Twill
Ad

Similar to Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎 (20)

PDF
Apache Eagle: Secure Hadoop in Real Time
PDF
Apache Eagle Architecture Evolvement
PPTX
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
PDF
Apache Eagle - Monitor Hadoop in Real Time
PDF
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
PDF
Introduction To Hadoop Ecosystem
PPTX
Hadoop Turns a Corner and Sees the Future
PPT
Hadoop at Ebay
PPTX
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
PPTX
Brief Introduction about Hadoop and Core Services.
PPTX
Big Data Technology Stack : Nutshell
PDF
Webinar: The Future of Hadoop
PPTX
Cloudera Hadoop Distribution
PDF
Enterprise Data Lakes
PDF
Hadoop - Architectural road map for Hadoop Ecosystem
PDF
Data governance in Hadoop (My Personal Notes)
PDF
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
PPTX
Soft-Shake 2013 : Enabling Realtime Queries to End Users
PDF
Building a Hadoop Data Warehouse with Impala
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle Architecture Evolvement
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Apache Eagle - Monitor Hadoop in Real Time
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Introduction To Hadoop Ecosystem
Hadoop Turns a Corner and Sees the Future
Hadoop at Ebay
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Brief Introduction about Hadoop and Core Services.
Big Data Technology Stack : Nutshell
Webinar: The Future of Hadoop
Cloudera Hadoop Distribution
Enterprise Data Lakes
Hadoop - Architectural road map for Hadoop Ecosystem
Data governance in Hadoop (My Personal Notes)
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Soft-Shake 2013 : Enabling Realtime Queries to End Users
Building a Hadoop Data Warehouse with Impala
Ad

Recently uploaded (20)

PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
web development for engineering and engineering
PDF
Digital Logic Computer Design lecture notes
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
composite construction of structures.pdf
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Welding lecture in detail for understanding
PPT
Mechanical Engineering MATERIALS Selection
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
R24 SURVEYING LAB MANUAL for civil enggi
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
web development for engineering and engineering
Digital Logic Computer Design lecture notes
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
composite construction of structures.pdf
Embodied AI: Ushering in the Next Era of Intelligent Systems
OOP with Java - Java Introduction (Basics)
Foundation to blockchain - A guide to Blockchain Tech
Automation-in-Manufacturing-Chapter-Introduction.pdf
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Welding lecture in detail for understanding
Mechanical Engineering MATERIALS Selection
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
R24 SURVEYING LAB MANUAL for civil enggi

Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

  • 2. Agenda •About Eagle •Front End – Evolution – Modularization – Features •Back End – Architecture – Tech Highlights – Integration •Q & A 2
  • 3. 3 Apache Eagle is a distributed real-time monitoring and alerting engine for Hadoop from eBay Open sourced as Apache Incubator Project on Oct 26th 2015 See http://eagle.incubator.apache.org or http://goeagle.io
  • 4. Hadoop @eBay 4 1-10 nodes 2007 100+ nodes 1000 + core 1 PB 2010 2011 1000+ node 10,000+ core 10+ PB 4000+ node 40,000+ core 50+ PB 2013 2015 10,000+ nodes 150,000+ cores 150+ PB 2009 10+ nodes
  • 11. Architecture 11 STREAM PROCESSING ENGINE User Profile based Anomaly detection Policy evaluation based Framework Eagle Storage (Metadata, metrics, alerts… User Profile training Eagle Query DataCollection(Kafka,Yarn API…) Had oop jmx DataSink(email,Kakfa…) Other Remediation Systems …
  • 12. Tech Highlights •Data Collection •Stream Processing DSL •Distributed Policy Engine •ML-based anomaly detection •Query Framework NOTE {NAME}-{NUMBER} like HDFS-6914 means open source project ticket id contributed by us 12
  • 13. Apache Eagle – Data Collection Decoupled with Apache Kafka • high-throughput distributed messaging • Easy to inject various kinds of data sources • Python/Java/C++ Kafka clients Current data sources support • Hadoop data  HDFS, HBase audit log  GC logs  JMX metrics  History/Running MR job data • … • Generic format data 13
  • 14. Apache Eagle – Stream Processing DSL Easy use – Easily assemble data transformation, filtering, join… Flexibility – Physical execution platform independent 14 STREAM PROCESSING ENGINE STREAM PROCESSING ENGINE .flatMap(AuditLogTransformer) .groupBy(_.user) .flatMap(UserProfileAggregator); env.fromKafka (KafkaConfig) .alert.persistAndEmail val env = ExecutionEnvironment.getStorm() env.execute()
  • 15. Apache Eagle – Stream Processing DSL 15 .flatMap(AuditLogTransformer) .groupBy(_.user) .flatMap(UserProfileAggregator); env.fromKafka (KafkaConfig) .alert.persistAndEmail val env = ExecutionEnvironment.getStorm() env.execute(); Distributed Streaming Cluster Environment AlertExecutor_{1} AlertExecutor_{2} … AlertExecutor_{N} Alerts Real-time Event Stream Stream_{1} Stream_{*} Stream Processing env.execute()
  • 16. Apache Eagle - Distributed Real-time Policy Engine Features • Extensibility • Usability • Real-time • Scalability • Metadata-driven 16 METADATA MANAGER Distributed Streaming Cluster Environment AlertExecutor_{1} AlertExecutor_{2} … AlertExecutor_{N} Real Time Alerts Alerts Policy Management Policy Dynamical Policy Deployment Real-time Event Stream Stream_{1} Stream_{*} Dynamical Stream Schema Stream Processing
  • 17. Apache Eagle – Distributed Real-time Policy Engine 17 Distributed Real-time Policy Engine Siddhi CEP Policy Evaluator Machine Learning Policy Evaluator Extensibility • Default is WSO2 Siddhi CEP • Powerful SQL-Like event stream processing • Open to other customized policy engine Extensible Policy Evaluator public interface PolicyEvaluatorServiceProvider { public String getPolicyType(); // literal string to identify one type of policy public Class<? extends PolicyEvaluator<T>> getPolicyEvaluator(); // get policy evaluator implementation public List getBindingModules(); // policy text with json format to object mapping } public interface PolicyEvaluator { public void evaluate(ValuesArray input) throws Exception; // evaluate input event public void onPolicyUpdate(AlertDefinitionAPIEntity newAlertDef);// policy update public void onPolicyDelete(); // invoked when policy is deleted } METADATA MANAGER Policy/Metadata
  • 18. Apache Eagle – Distributed Real-time Policy Engine 18 METADATA MANAGER Distributed Streaming Cluster Environment Real Time Alerts Alerts Policy Management Policy Dynamical Policy Deployment Usability • Powerful SQL-Like CEP CQL for Policy Definition • Dynamical Policy Lifecycle Management (Deployment/Update) • Easy-to-use Policy management and Alert analytics UI from metricStream[(name == 'ReplLag') and (value > 1000)] select * insert into outputStream;
  • 19. Apache Eagle – Distributed Real-time Policy Engine 19
  • 20. Apache Eagle – Distributed Real-time Policy Engine 20 Real-time • Stream events are processed and alerts are evaluated during streaming Distributed Streaming AlertExecutor_{1} AlertExecutor_{2} … AlertExecutor_{N} Real Time Alerts Alerts Stream_{1} Stream_{*} Stream Processing Real-time Event Stream
  • 21. Apache Eagle – Distributed Real-time Policy Engine 21 Metadata-Driven • Stream Schema: AlertStreamSchemaEntity • Policy Definition: AlertDefinitionAPIEntity @Table("alertdef") @ColumnFamily("f") @Prefix("alertdef") @Service(AlertConstants.ALERT_DEFINITION_SERVICE_ENDPOINT_NAME) @JsonIgnoreProperties(ignoreUnknown = true) @TimeSeries(false) @Tags({"site", "dataSource", "alertExecutorId", "policyId", "policyType"}) @Indexes({ @Index(name="Index_1_alertExecutorId", columns = { "alertExecutorID" }, unique = true), }) public class AlertDefinitionAPIEntity extends TaggedLogAPIEntity{ @Column("a") private String desc; @Column("b") private String policyDef; @Column("c") private String dedupeDef; METADATA MANAGER Distributed Real-time Policy Engine Dynamic Metadata Loading
  • 22. Apache Eagle – Distributed Real-time Policy Engine 22 Distributed Streaming Cluster Environment AlertExecutor_{1} AlertExecutor_{2} … AlertExecutor_{N} Stream_{1} Stream_{*} Stream Processing Scalability • Policy scalability: policy partitioning • Event scalability: grouping • Example: N Users with 3 partitions, M policies with 2 partitions, then 3*2 physical tasks
  • 23. Apache Eagle – Query Framework 23 Query Syntax • Full-function SQL-Like REST Query (aggregation, sorting…) Eagle Storage • NOSQL storage like HBase • RDMS • Other storage systems
  • 24. Apache Eagle – ML-based Anomaly Detection 24 User Activity Anomaly Detection • User profile feature selection • Offline user profile generation • Online Anomaly detection Useful link • Eagle: User profile- based anomaly detection for securing Hadoop clusters
  • 25. Apache Eagle – Integration I 25 • Eagle in Apache Ambari – natively be part of hadoop ecosystem – http://eagle.incubator.apache.org/docs/ambari-plugin-install.html • Eagle in Docker – natively fly on Cloud/Container – https://github.com/apache/incubator-eagle
  • 26. Apache Eagle – Integration II 26 •Apache Ranger – remediation engine – Eagle data source •Splunk – Eagle alert consumer – EAGLE alert output is the 1st abstraction of analytics and Splunk is the 2nd abstraction • Dataguise, Apache knox – Eagle data source
  • 27. Learn more about Apache Eagle 27 • EAGLE: USER PROFILE-BASED ANOMALY DETECTION IN HADOOP CLUSTER (IEEE) • EAGLE: DISTRIBUTED REALTIME MONITORING FRAMEWORK FOR HADOOP CLUSTER

Editor's Notes

  • #12: 三部分: 数据收集 + 实时流处理 + metric/Alerts sink Eagle实时流处理框架中重点两部分: 基于Policy evaluation的Alerting框架 和 基于ML的自动异常检测模块
  • #14: 数据接入以Apache Kafka解耦, Kafka作为数据源接口,1)提供分布式高吞吐消息传输 2)有各种语言支持的client方便数据导入 Eagle处理目前可以处理Hadoop领域各种logs(GC, audit log), Jmx Metrics并提供预警功能;同时,也支持用户自定义格数的数据处理
  • #15: Eagle提供流处理DSL,用户可以轻松定义数据处理逻辑图 并且保持底层执行平台独立性 用户可以自由选择底层的数据流处理平台,目前eagle默认使用storm,也可以很方便切换到 Flink, Spark等流处理引擎上。
  • #16: 这个Eagle代码到逻辑流处理图的映射图
  • #17: Policy Engine位于 Eagle数据流处理的最后一个节点AlertExecutor,负责动态policy加载、对事件流进行policy评估等。 具有 Extensibility Usability Real-time Scalability Meta-driven
  • #18: We use WSO2 Siddhi as first class policy engine, but CEP engine can’t cover everything, for example node anomaly detection – we compare all the nodes in the cluster in some time window. Eagle Policy evaluation引擎默认使用Siddhi作为底层 实时事件处理器 CEP 1)实时复杂数据流处理引擎 2)提供Powerful SQL-like的事件流处理逻辑定义 除了基于siddhi的policy evaluator,用户还可以轻松定义自己的Policy evaluator,比如后面提到的基于ML training model的policy evaluator。
  • #19: 1)EAGLE默认支持的 Siddhi CEP具有很强大的复杂逻辑处理能力,以支持各种复杂的policy定义 2)Eagle有动态Policy生命周期管理,可以实时更新policy 3)很友好的policy定义界面,用户不需要关心底层复杂policy定义语法。
  • #22: Metadata 来自两个方面: 数据流本身的schema定义 和 policy定义的metadata 基于metadata设计的好处是:不论数据流怎么变化,policy定义怎么负责,Policy Engine一样可以工作。
  • #25: ML-based Anomaly Detection 的基本思路 1)选取描述用户行为的基本特征的feature 2)Offline模式training出用户异常行为的model/policy 3)基于ML policy evaluator做实时异常detection
  • #26: Eagle作为Hadoop生态圈的产物,提供了Ambari的插件,让Eagle更好地称为生态圈的一部分。 同时,由于EAGLE依赖的多个底层组件(HBASE, KAFKA, STORM,HDFS),我们提供了Docker的部署方式,用户可以方便的搭建起EAGLE
  • #27: Eagle provides comprehensive solution to secure sensitive data stored in Hadoop. EAGLE提供更全面的数据数据安全解决方案。