SlideShare a Scribd company logo
The next AMPLab:
Real-time Intelligent
Secure Execution
Ion Stoica
October 26, 2016
Berkeley’s AMPLab
2011 – 2016
• Mission: “Make sense of big data”
• 8 faculty, 60+ students
Governmental and industrial founding
2
Algorithms
Machine
s
People
AMPLab Goal and Impact
3
Goal: Next generation of open source
data analytics stack for industry & academia
Berkeley Data Analytics Stack (BDAS)
…
What is next?
RISE: Real-time Intelligent
Secure Execution
From batch data to advanced analytics
AMPLab
6
From live data to real-time decisions
RISELab
Why?
Data only as valuable as the decisions it
enables
7
Why?
What does this mean?
• Faster decisions better than slower decisions
• Decisions on fresh data better than decisions on stale data
• Decisions on personalized data better than on generic data
8
Data only as valuable as the decisions it
enables
Goal
Real-time decisions
on live data
with strong security
9
decide in ms
the current state of the
environment
privacy, confidentiality, integrity
Typical decision system
10
Decision System DecisionData
Preproces
s
(e.g., train)
Intermediate
data
(e.g., model)
Query
engine
Automatic
decision engine
update latency decision latency
Want low update latency & low decision
latency
Why is it hard?
Want high quality decisions
• Sophisticated, e.g., fraud, forecast, fleet of drones
• Accuracy, low false positives and negatives
• Robust to noisy and unforseen data
Want low latency for both updates and decisions
Want strong security: privacy, confidential, integrity
11
Example: Zero-time defense
12
Problem: zero-day attacks can compromise
millions of hosts in seconds
Solution: analyze network flows to detect
attacks and patch hosts/software in real-time
• Intermediate data: create attack model
• Decision: detect attack, patch
Quality sophisticated, accurate, robust
Latency update (sec ) / decision (ms)
Security privacy (encourage users to share logs),
integrity
Application Quality
Latency
SecurityUpdate Decisio
n
Zero-time defense sophisticated, accurate, robust sec ms privacy, integrity
Parking assistant sophisticated, robust sec sec privacy
Disease discovery sophisticated, accurate hours sec/min privacy, integrity
IoT (smart buildings) sophisticated, robust min/hour sec privacy, integrity
Earthquake warning sophisticated, accurate, robust min ms integrity
Chip manufacturing sophisticated, accurate, robust min sec/min confidentiality, integrity
Fraud detection sophisticated, accurate min ms privacy, integrity
“Fleet” driving sophisticated, accurate, robust sec sec privacy, integrity
Virtual companion sophisticated, robust min/hour sec integrity
Video QoS at scale sophisticated min ms/sec privacy, integrity
Challenges
14
Automated decisions
on live data are hard
Poor security: exploits
are daily occurrences
One-off solutions,
expensive, slow to build
Real-time, sophisticated decisions
that guarantee worst-case behavior
on noisy and unforseen live data
Ensure privacy and integrity
without impacting functionality
General platform:
Secure Real-time Decision Stack
RISE Lab
Research directions
Systems: 100x lower latency, 1,000x higher
concurrency than today’s Spark
Machine learning: Robust, on-line ML algorithms
Security: achieve privacy, confidentiality, and integrity
without impacting performance or functionality
15
Early work
Drizzle
Opaque
16
Streaming
Micro-batching vs. record-at-a-time
Micro-batching (e.g., Spark) inherits batch’s properties
• fault-tolerance
• straggler mitigation
• optimizations
• unification with other libraries
Record-at-a-time (e.g., Storm, Flink), typically lower
latency
17
Yahoo’s streaming benchmark
Input: 20M JSON ad-events / second, 100 campaigns
Output: ad counts per campaign over a 10sec window
Latency: (end of window) – (time last event was processed)
SLA: 1sec
Findings: Storm, Flink provide indeed lower latency than
Spark
18
Streaming systemads
ad counts
per campaign
Spark Streaming
19
…
Workers
Master
Process batchSchedule tasks
task
task
task
20
Workers
Master
…
Spark Streaming
Cluster status
Spark Streaming
21
…
Workers
Master
Process batchSchedule tasks
task
task
task
22
Workers
Master
…
Spark Streaming
Cluster status
Drizzle
Goal: reduce Spark streaming latency by at least 10x
Key observation: consecutive iterations use same
DAG
Solution: push scheduling decisions to workers
23
Group scheduling
24
Workers
Master
…
Spark Streaming Drizzle
Workers
Master
…
task
s
task
s
task
s
25
Workers
Master
…
Spark Streaming Drizzle
Workers
Master
……
task
s
task
s
task
s
Process batch
26
Workers
Master
…
Spark Streaming Drizzle
Workers
Master
……
task
s
task
s
task
s
Process batch
27
Workers
Master
…
Spark Streaming Drizzle
Workers
Master
……
task
s
task
s
task
s
Cluster status
Latency
28
0
0.2
0.4
0.6
0.8
1
0 500 1000 1500 2000 2500 3000
Spark
Flink
Final Event Latency (ms)
CDF
Latency
29
0
0.2
0.4
0.6
0.8
1
0 500 1000 1500 2000 2500 3000
Spark
Drizzle
Flink
Final Event Latency (ms)
CDF
Similar latency
to Flink
Latency, w/ ReduceBy optimization
30
0
0.2
0.4
0.6
0.8
1
0 200 400 600 800
Spark
Drizzl
e
Flink
CDF
Final Event Latency (ms)
Aggregate counters on
map side to reduce shuffle
traffic
Latency, w/ ReduceBy optimization
31
0
0.2
0.4
0.6
0.8
1
0 200 400 600 800
Spark
Drizzl
e
Flink
CDF
Final Event Latency (ms)
Aggregate counters on
map side to reduce shuffle
traffic
Fault tolerance
32
1
10
100
1000
10000
100000
150 200 250 300 350
Latency(ms)
Drizzle
Spark
Flink
four nines SLA: 8.6 sec per day exceeding SLA
Recovers 5x
faster than Flink
with 10x lower
latency
Time (seconds)
Early results
Drizzle
Opaque
33
State-of-the-art security today
Authentication, encryption at-rest and in-motion
34
Spark Core
Spark
Streaming
Spark
SQL
MLlib GraphX
OS (e.g., Linux),
Cluster Manager (e.g., Kubernetes),
Hypervisor (e.g., Xen)
private/public cluster
Not enough if OS or
hypervisor compromised,
and attacker get root access
State-of-the-art security today
Authentication, encryption at-rest and in-motion
35
Spark Core
Spark
Streaming
Spark
SQL
MLlib GraphX
OS (e.g., Linux),
Cluster Manager (e.g., Kubernetes),
Hypervisor (e.g., Xen)
private/public cluster
Not enough if attacker
can observe network
and memory access
patters
Opaque
Leverage Intel’s SGX: hardware enclave
Implement secure distributed relational algebra
36
Execution
Spark
Streaming
Spark
SQL
MLlib GraphX
Query Optimizer (Catalyst)
enc-filter enc-join enc-agg enc-sort
Opaque: two modes
Encryption mode
• Protect against compromised software (e.g., OS)
• Full data encryption, authentication, and computation
verification in hardware enclave
Oblivious mode
• Additionally, hide data access pattern
37
Opaque: Big Data Benchmark
38
0.01
0.1
1
10
100
Query 1 Query 2 Query 3
Runtime(s)
SparkSQL Opaque encryption Opaque oblivious
Opaque: Big Data Benchmark
39
0.01
0.1
1
10
100
Query 1 Query 2 Query 3
Runtime(s)
SparkSQL Opaque encryption Opaque oblivious
Encrypted
operators
implemented in
C++
Opaque: Big Data Benchmark
40
0.01
0.1
1
10
100
Query 1 Query 2 Query 3
Runtime(s)
SparkSQL Opaque encryption Opaque oblivious
Up to 100x slower
but 1,000x faster
than state-of-the-
art
Next AMPLab: RISELab
Already promising results
Expect much more over the next five years!
41
Goal: develop Secure Real-time Decision Stack,
an open source platform, tools and algorithms
for real-time decisions on live data with strong
security
Thank you
AMPLab alumni presenting here
43
Example: “Fleet” driving
Problem: suboptimal driving decisions
Solution: collect & leverage info from
other cars and drivers in real-time
• Intermediate data: automatically annotate
maps, actions of other drivers
• Decision: avoid obstacles, congestions
44
Quality sophisticated, accurate, noise tolerant
Performanc
e
sec (decision) / sec (update)
Security privacy, data integrity
Not only hypothetical
Attacks getting root access by
exploiting OS/DBs
vulnerabilities
Attacks exploiting access
pattern leakages
45
46
Workers
Master
…
Spark Streaming Drizzle
Workers
Master
……
task
s
task
s
task
s
Process batch

More Related Content

PPTX
Simplifying Big Data Applications with Apache Spark 2.0
PPTX
How Apache Spark Is Helping Tame the Wild West of Wi-Fi
PDF
Spark Summit EU talk by John Musser
PDF
Trends for Big Data and Apache Spark in 2017 by Matei Zaharia
PDF
Spark Summit EU talk by Ruben Pulido Behar Veliqi
PDF
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
PDF
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
PDF
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Simplifying Big Data Applications with Apache Spark 2.0
How Apache Spark Is Helping Tame the Wild West of Wi-Fi
Spark Summit EU talk by John Musser
Trends for Big Data and Apache Spark in 2017 by Matei Zaharia
Spark Summit EU talk by Ruben Pulido Behar Veliqi
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad

What's hot (20)

PDF
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
PDF
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
PDF
Big Telco - Yousun Jeong
PDF
Building a Business Logic Translation Engine with Spark Streaming for Communi...
PDF
Spark Summit EU talk by Patrick Baier and Stanimir Dragiev
PDF
Spark Summit EU talk by Christos Erotocritou
PDF
Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
PPTX
Spark Summit EU talk by Kaarthik Sivashanmugam
PDF
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
PDF
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
PDF
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
PPTX
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
PDF
Spark Summit EU talk by Debasish Das and Pramod Narasimha
PDF
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
PDF
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
PDF
Spark Summit EU talk by Stephan Kessler
PDF
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
PDF
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
PDF
Dr. Elephant: Achieving Quicker, Easier, and Cost-Effective Big Data Analytic...
PDF
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...
Big Telco - Yousun Jeong
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Spark Summit EU talk by Patrick Baier and Stanimir Dragiev
Spark Summit EU talk by Christos Erotocritou
Using Apache Spark in the Cloud—A Devops Perspective with Telmo Oliveira
Spark Summit EU talk by Kaarthik Sivashanmugam
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark Summit EU talk by Stephan Kessler
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Dr. Elephant: Achieving Quicker, Easier, and Cost-Effective Big Data Analytic...
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Ad

Viewers also liked (20)

PDF
MmmooOgle: From Big Data to Decisions for Dairy Cows
PPTX
The Spark (R)evolution in The Netherlands
PPTX
Democratizing AI with Apache Spark
PDF
Spark Summit EU talk by Sudeep Das and Aish Faenton
PDF
Spark Summit EU talk by Sital Kedia
PDF
Spark Summit EU talk by Reza Karimi
PPTX
Spark Summit EU talk by Sameer Agarwal
PDF
Oracle big data and rtd v5
PDF
Spark Summit EU talk by Chris Pool and Jeroen Vlek
PDF
Spark Summit EU talk by Ahsan Javed Awan
PDF
Spark Summit EU talk by Oscar Castaneda
PDF
Spark Summit EU talk by Javier Aguedes
PDF
Spark Summit EU talk by Bas Geerdink
PDF
Spark Summit EU talk by Berni Schiefer
PDF
Spark Summit EU talk by Josef Habdank
PDF
Spark Summit EU talk by Luca Canali
PDF
Spark Summit EU talk by Heiko Korndorf
PDF
Spark Summit EU talk by Jorg Schad
PDF
Spark Summit EU talk by Erwin Datema and Roeland van Ham
PDF
Spark Summit EU talk by Oscar Castaneda
MmmooOgle: From Big Data to Decisions for Dairy Cows
The Spark (R)evolution in The Netherlands
Democratizing AI with Apache Spark
Spark Summit EU talk by Sudeep Das and Aish Faenton
Spark Summit EU talk by Sital Kedia
Spark Summit EU talk by Reza Karimi
Spark Summit EU talk by Sameer Agarwal
Oracle big data and rtd v5
Spark Summit EU talk by Chris Pool and Jeroen Vlek
Spark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Javier Aguedes
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Berni Schiefer
Spark Summit EU talk by Josef Habdank
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Jorg Schad
Spark Summit EU talk by Erwin Datema and Roeland van Ham
Spark Summit EU talk by Oscar Castaneda
Ad

Similar to The Next AMPLab: Real-Time, Intelligent, and Secure Computing (20)

PDF
Spark Summit EU 2016: The Next AMPLab: Real-time Intelligent Secure Execution
PDF
RISELab:Enabling Intelligent Real-Time Decisions
PDF
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
PDF
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
PPTX
Cisco OpenSOC
PPTX
2015 04 bio it world
PPTX
McAfee - Enterprise Security Manager (ESM) - SIEM
PPTX
Splunk App for Stream
PPT
Petascale Analytics - The World of Big Data Requires Big Analytics
PDF
Data Analytics at Altocloud
PPTX
stackArmor - Security MicroSummit - McAfee
PDF
Petascale Visualization: Approaches and Initial Results
PPT
Computing Outside The Box June 2009
PDF
Evolution from EDA to Data Mesh: Data in Motion
PPTX
Preparing for the Cybersecurity Renaissance
PDF
IoT meets Big Data
PDF
Accelerating Cyber Threat Detection With GPU
ODP
Open solaris customer presentation
PPTX
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
PDF
The Cell at Los Alamos: From Ray Tracing to Roadrunner
Spark Summit EU 2016: The Next AMPLab: Real-time Intelligent Secure Execution
RISELab:Enabling Intelligent Real-Time Decisions
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
Cisco OpenSOC
2015 04 bio it world
McAfee - Enterprise Security Manager (ESM) - SIEM
Splunk App for Stream
Petascale Analytics - The World of Big Data Requires Big Analytics
Data Analytics at Altocloud
stackArmor - Security MicroSummit - McAfee
Petascale Visualization: Approaches and Initial Results
Computing Outside The Box June 2009
Evolution from EDA to Data Mesh: Data in Motion
Preparing for the Cybersecurity Renaissance
IoT meets Big Data
Accelerating Cyber Threat Detection With GPU
Open solaris customer presentation
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
The Cell at Los Alamos: From Ray Tracing to Roadrunner

More from Spark Summit (20)

PDF
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
PDF
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
PDF
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
PDF
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
PDF
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
PDF
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
PDF
Next CERN Accelerator Logging Service with Jakub Wozniak
PDF
Powering a Startup with Apache Spark with Kevin Kim
PDF
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
PDF
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
PDF
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
PDF
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
PDF
Goal Based Data Production with Sim Simeonov
PDF
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
PDF
Getting Ready to Use Redis with Apache Spark with Dvir Volk
PDF
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
PDF
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Next CERN Accelerator Logging Service with Jakub Wozniak
Powering a Startup with Apache Spark with Kevin Kim
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Goal Based Data Production with Sim Simeonov
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...

Recently uploaded (20)

PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Introduction to Business Data Analytics.
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Global journeys: estimating international migration
STUDY DESIGN details- Lt Col Maksud (21).pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Supervised vs unsupervised machine learning algorithms
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Clinical guidelines as a resource for EBP(1).pdf
Introduction to Business Data Analytics.
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
oil_refinery_comprehensive_20250804084928 (1).pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
climate analysis of Dhaka ,Banglades.pptx
.pdf is not working space design for the following data for the following dat...
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Global journeys: estimating international migration

The Next AMPLab: Real-Time, Intelligent, and Secure Computing

Editor's Notes

  • #2: Good morning, and welcome to Spark Summit Europe. It’s fantastic to see so much excitement. Today, I’m very excited to tell you about our plans at Berkeley following AMPLab.
  • #3: But first let me say a few words about AMP Lab as many of you might not be familiar with it. AMP Lab is a project at University of California at Berkeley that started in 2011 and it will end this year. The vision of AMPLab was to make sense of big data by using a holistic approach involving algorithms (in particular ML algos), machines, ie, systems, and people, ie crowd sourcing. Hence AMP, the name of the lab. The lab was founded both by governemnt, NSF and Darpa, and almost 40 other companies, including Amazon, Giigle, IBM, SAP, and many others.
  • #4: The goal of the lab was to build next gen…. So why should you care? Well, AMP Lab was the original place where Apache Spark was developed, which all of us are using today. In addition, at AMPLab we developed a bunch of other systems that you might have heard of or you might have used, including Apache Mesps, and Alluxio, former known as Tachyon.
  • #18: As you know there are basically two types of streaming systems: record-at-a-time and micro-batching. Spark is an example of micro-batching system. As such, like Matei emphasized in his talk, it inherits all the desirable properties of batching model: fault-tolerance, straggler mitigation, optimization, and unification with other libraries. In contrast, record-at-a-time systems like as Storm and Flink provide typically a lower latency.