SlideShare a Scribd company logo
Big Data
Technical Benchmarking
Arne J. Berre, SINTEF,
Todor Ivanov, Univ. Frankfurt,
Tomas Pariente Lobo, Atos
BDVe – Databench Webinar, October 9, 2018
11/10/2018 DataBench Project - GA Nr 780966 1
Technical Benchmarks in Databench Workflow
2© IDC
2© IDC
Technical BenchmarksBusiness Benchmarks
Goals&Objectives
• The DataBench Toolbox will be a component-based system of both vertical (holistic/business/data type driven)
and horizontal (technical area based) big data benchmarks. following the layered architecture provide by the
BDVA reference model.
Holistic benchmarking approach for big data
• It should be able to work or integrate with existing benchmarking initiatives and resources where possible.
Not reinventing the wheel, but use wheels to build a new car
• The Toolbox will investigate gaps of industrial significance in the big data benchmarking field and contribute to
overcome them.
Filling gaps
• The Toolbox will implement ways to derive as much as possible the DataBench technical metrics and business
KPIs from the metrics extracted from the integrated benchmarking.
Homogenising metrics
• It will include a web-based visualization layer to assist to the final users to specify their benchmarking
requirements, such as selected benchmark, data generators, workloads, metrics and the preferred data, volume
and velocity, as well as searching and monitoring capabilities.
Web user interface
11-10-2018 5www.bdva.eu
BDV Reference Model
Identifying and Selecting Benchmarks
7
23
Domain/Sector/Busi
ness solutions KPIs
(Manufact,
Transport, Energy,..
Business
Transport
Manufacturing
Energy
.. Domain X …
22 Standards x x x x x x x x x x
MetaData x
Graph, Network x x x x x x x x x x x
Text, NLP, Web x x x x x x x x x x x x x x x x x x x x x x
Image, Audio x x x x
Spatio Temp x x
Time Series, IoT x x x x x x x
Structured, BI x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
18 Visual Analytics x
17
Industrial Analytics
(Descriptive,
Diagnostic,
Predictive,
Prescriptive)
x x x x
16
Machine Learning,
AI, Data Science
x x x x x x x x x x x x
Streaming/ Realtime
Processing x x x x x x x
Interactive
Processing
x x x x x x x x x x x x x x x x x x x x x x x
Batch Processing x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
Data
Privacy/Security
15
Data
Governance/Mgmt
x
14 Data Storage x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
19
Communication &
Connectivity
x x
9
Cloud Services &
HPC, Edge x x x x x x
TPC-H
TPC-DSv1
LinearRoad
HadoopWorkloadExamples
GridMix
PigMix
MRBench
CALDA
HiBench
YCSB
SWIM
CloudRank-D
PUMABenchmarkSuite
CloudeSuite
MRBS
AMPLabBigDataBenchmark
BigBench
BigDataBench
LinkBench
BigFrame
PRIMEBALL
SemanticPublishingBenchmark(SPB)
SocialNetworkBenchmark
StreamBench
TPCx-HS
SparkBench
TPCx-V
BigFUN
TPC-DSv2
TPCx-BB
Graphalytics
YahooStreamingBenchmark(YSB)
DeepBench
DeepMark
TensorFlowBenchmarks
Fathom
AdBench
RIoTBench
HobbitBenchmark
TPCx-HSv2
BigBenchV2
Sanzu
Pennmachinelearningbenchmark(PMLB)
OpenMLbenchmarksuites
Senska
DAWNBench/MLPerf
IDEBench
ABench
1999
2002
2004
2009
2011
2018
BDVA Reference Model
2016
2017
2014
Verticals,incl.DatatypesAnalytics,Processing,DataManagement,Infra
Benchmarks
2015
2013
2012
2010
2008
2007
Updating with
new Benchmarks
Identifying and Selecting Benchmarks
Dimensions of Technical Benchmarks
9
Summary
Category Year Name Type Domain Data Type
Micro-
benchmarks
2010 HiBench Micro-benchmark Suite
Micro-benchmarks, Machine
Learning, SQL, Websearch,
Graph, Streaming Benchmarks
Structured, Text,
Web Graph
2015 SparkBench Micro-benchmark Suite
Machine Learning, Graph
Computation, SQL, Streaming
Application
Structured, Text,
Web Graph
2010 YCSB Micro-benchmark cloud OLTP operations Structured
2017 TPCx-IoT Micro-benchmark
workloads on typical IoT
Gateway systems
Structured, IoT
Application
Benchmarks
2015
Yahoo Streaming
Benchmark
Application Streaming
Benchmark
advertisement analytics pipeline
Structured, Time
Series
2013 BigBench/TPCx-BB
Application End-to-end
Benchmark
a fictional product retailer
platform
Structured, Text,
JSON logs
2017 BigBench V2
Application End-to-end
Benchmark
a fictional product retailer
platform
Structured, Text,
JSON logs
2018
ABench (Work-in-
Progress)
Big Data Architecture Stack
Benchmark
set of different workloads
Structured, Text,
JSON logs
10
Some of the benchmarks to integrate (I)
Year Name Type
2010 HiBench Big data benchmark suite for evaluating different big data frameworks. 19
workloads including synthetic micro-benchmarks and real-world applications from
6 categories which are micro, machine learning, sql, graph, websearch and
streaming.
2015 SparkBench System for benchmarking and simulating Spark jobs. Multiple workloads
organized in 4 categories.
2010 Yahoo! Cloud System
Benchmark (YSCB)
Evaluates performance of different “key-value” and “cloud” serving systems,
which do not support the ACID properties. The YCSB++ , an extension, includes
many additions such as multi-tester coordination for increased load and eventual
consistency measurement.
2017 TPCx-IoT Based on YCSB, but with significant changes. Workloads of data ingestion and
concurrent queries simulating workloads on typical IoT Gateway systems. Dataset
with data from sensors from electric power station(s)
11
Micro-benchmarks:
Some of the benchmarks to integrate (II)
Year Name Type
2015 Yahoo Streaming
Benchmark (YSB)
The Yahoo Streaming Benchmark is a streaming application benchmark
simulating an advertisement analytics pipeline.
2013 BigBench/TPCx-BB BigBench is an end-to-end, technology agnostic, application-level
benchmark that tests the analytical capabilities of a Big Data platform. It is
based on a fictional product retailer business model.
2017 BigBench V2 Similar to BigBench, BigBench V2 is an end-to-end, technology agnostic,
application-level benchmark that tests the analytical capabilities of a Big
Data platform
2018 ABench (Work-in-
Progress)
New type of multi-purpose Big Data benchmark covering many big data
scenarios and implementations. Extends other benchmarks such as
BigBench
12
Application-oriented benchmarks:
 The BigBench specification comprises
two key components:
 a data model specification
 a workload/query specification.
 The structured part of the BigBench
data model is adopted from the TPC-DS
data model
 The data model specification is
implemented by a data generator, which
is based on an extension of PDGF.
 BigBench 1.0 workload specification
consists of 30 queries/workloads (10
structured from TPC-DS, and 20
adapted from a McKinsey report on Big
Data use cases and opportunities).
 BigBench 2.0 …
The BigBench data model
The BigBench 2.0 overview
Rabi T., et al. The Vision of BigBench 2.0, 2016.
Proceedings of the Fourth Workshop on Data
analytics in the Cloud. Article No. 3,
http://blog.cloudera.com/blog/2014/11/bigbench-toward-an-industry-standard-benchmark-for-big-data-analytics/
Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018
 Benchmark any step of the Linked Data
lifecycle
 Ensure that benchmarking results can be
found, accessed, integrated and reused
easily (FAIR principles)
 Benchmark Big Data platforms by being the
first distributed benchmarking platform for
Linked data.
 The Hobbit platform comprises several
components:
 Single components are implemented as
independent containers.
 Communication between these components is
done via a message bus.
 Everything is dockerized, from the
benchmarked system to all the components
Principles:
• Users can test systems with the HOBBIT
benchmarks without having to worry about
finding standardized hardware
• New benchmarks can be easily created and added
to the platform by third parties.
• The evaluation can be scaled out to large datasets
and on distributed architectures.
• The publishing and analysis of the results of
different systems can be carried out in a uniform
manner across the different benchmarks.
Summary
• DataBench:
• A framework for big data benchmarking for PPP projects and big data practitioners
• We will provide methodology and tools
• Added value:
• An umbrella to access to multiple benchmarks
• Homogenized technical metrics
• Derived business KPIs,
• A community around
• PPP projects, industrial partners (BDVA and beyond) and benchmarking initiatives are
welcomed to work with us, either to use our framework or to add new benchmarks
Big Data Benchmark session at EBDVF'2018
11/10/2018 DataBench Project - GA Nr 780966 17
Monday November 12th, 1700 – 1830,EBDVF'2018, Vienna
Arne.J.Berre@sintef.no
todor@dbis.cs.uni-frankfurt.de
tomas.parientelobo@atos.net
Evidence Based Big Data Benchmarking to
Improve Business Performance

More Related Content

PDF
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...
PPTX
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
PDF
IDS@BKM: Gaining Transparency in Automotive Supply Chains
PPTX
Virtual BenchLearning - Data Bench Framework
PPTX
Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...
PDF
Continuous Intelligence: Keeping your AI Application in Production
PPTX
Virtual BenchLearning - I-BiDaaS - Industrial-Driven Big Data as a Self-Servi...
PPTX
Virtual BenchLearning - DeepHealth - Needs & Requirements for Benchmarking
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
IDS@BKM: Gaining Transparency in Automotive Supply Chains
Virtual BenchLearning - Data Bench Framework
Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...
Continuous Intelligence: Keeping your AI Application in Production
Virtual BenchLearning - I-BiDaaS - Industrial-Driven Big Data as a Self-Servi...
Virtual BenchLearning - DeepHealth - Needs & Requirements for Benchmarking

What's hot (20)

PDF
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
PDF
Using dask for large systems of financial models
PPTX
Mapping presentation THAG big data from space
PDF
Building the DataBench Workflow and Architecture
PDF
Tag.bio aws public jun 08 2021
PDF
Accelerating Time to Research Using CloudBank
PPTX
Academia to industry looking back on a decade of ml
PPTX
SC1 Workshop 2 Pilot instantiations
PPTX
Demystify Big Data Breakfast Briefing - Juergen Urbanski, T-Systems
PDF
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
PDF
P. Struijs, Toward the Use of Big Data for European Statistics
PPTX
Ogi conf delft_v1_evangelos_kalampokis
PPTX
Build a car with Graphs, Fabien Batejat, Volvo Cars
PDF
SDIC'16 - Betrieb des Smart Data Innovation Labs - Vorstellung der Plattform
PDF
Self-Service Analytics with Guard Rails
PDF
A Multi-agent Approach for Processing Industrial Enterprise Data
PDF
Weekly Meeting 8.pdf
PDF
Data science for smart manufacturing at Pirelli
PDF
DataBench Toolbox in a Nutshell
PPTX
DURAARK presentation CIB W78 "Applications of IT in AEC" conference Beijing 2...
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
Using dask for large systems of financial models
Mapping presentation THAG big data from space
Building the DataBench Workflow and Architecture
Tag.bio aws public jun 08 2021
Accelerating Time to Research Using CloudBank
Academia to industry looking back on a decade of ml
SC1 Workshop 2 Pilot instantiations
Demystify Big Data Breakfast Briefing - Juergen Urbanski, T-Systems
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
P. Struijs, Toward the Use of Big Data for European Statistics
Ogi conf delft_v1_evangelos_kalampokis
Build a car with Graphs, Fabien Batejat, Volvo Cars
SDIC'16 - Betrieb des Smart Data Innovation Labs - Vorstellung der Plattform
Self-Service Analytics with Guard Rails
A Multi-agent Approach for Processing Industrial Enterprise Data
Weekly Meeting 8.pdf
Data science for smart manufacturing at Pirelli
DataBench Toolbox in a Nutshell
DURAARK presentation CIB W78 "Applications of IT in AEC" conference Beijing 2...
Ad

Similar to Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018 (20)

PDF
Building the DataBench Workflow and Architecture, Todor Ivanov, Bench 2019 - ...
PDF
DataBench Toolbox Demo, Ivan Martinez, Tomas Pariente Lobo, BDV Meet-Up Riga,...
PDF
13 pv-do es-18-bigdata-v3
PPTX
Relating Big Data Business and Technical Performance Indicators, Barbara Pern...
PDF
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
PDF
BigDataEurope @BDVA Summit2016 2: Societal Pilots
PDF
WSO2 Machine Learner - Product Overview
PDF
Tag.bio: Self Service Data Mesh Platform
PDF
Paris FOD Meetup #5 Cognizant Presentation
PDF
A technical Introduction to Big Data Analytics
PDF
Shareinsights an-end-to-end-implementation-of-the-modern-analytics-archi...
PPSX
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
DOCX
Resume
PPTX
JASPERSOFT LIVE DEMO - NAM
PDF
BDE SC3.3 Workshop - BDE review: Scope and Opportunities
PDF
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
PDF
WSO2 Data Analytics Server - Product Overview
PDF
Computer aided design, computer aided manufacturing, computer aided engineering
PDF
Big Data Analytics
Building the DataBench Workflow and Architecture, Todor Ivanov, Bench 2019 - ...
DataBench Toolbox Demo, Ivan Martinez, Tomas Pariente Lobo, BDV Meet-Up Riga,...
13 pv-do es-18-bigdata-v3
Relating Big Data Business and Technical Performance Indicators, Barbara Pern...
Agile Testing Days 2017 Intoducing AgileBI Sustainably - Excercises
BigDataEurope @BDVA Summit2016 2: Societal Pilots
WSO2 Machine Learner - Product Overview
Tag.bio: Self Service Data Mesh Platform
Paris FOD Meetup #5 Cognizant Presentation
A technical Introduction to Big Data Analytics
Shareinsights an-end-to-end-implementation-of-the-modern-analytics-archi...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Resume
JASPERSOFT LIVE DEMO - NAM
BDE SC3.3 Workshop - BDE review: Scope and Opportunities
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
WSO2 Data Analytics Server - Product Overview
Computer aided design, computer aided manufacturing, computer aided engineering
Big Data Analytics
Ad

More from DataBench (18)

PDF
Welcome to DataBench
PDF
Session 1 - The Current Landscape of Big Data Benchmarks
PDF
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...
PDF
Session 3 - The DataBench Framework: A compelling offering to measure the Imp...
PDF
Session 4 - A practical journey on how to use the DataBench Toolbox
PDF
CoreBigBench: Benchmarking Big Data Core Operations
PDF
Success Stories on Big Data & Analytics
PDF
DataBench Virtual BenchLearning "Success storie on Big Data & Analytics use c...
PDF
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...
PDF
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...
PDF
DataBench session @ BDV Meet-Up Riga: The case of HOBBIT, 27/06/2019
PDF
DataBench in a Nutshell - The market: Assessing Industrial Needs, Richard Ste...
PDF
Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019
PPTX
Impacts of data-driven AI in business sectors, Richard Stevens, ICT 2018, 05/...
PDF
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
PPTX
Building a Bridge between Technical and Business Benchmarking, Gabriella Catt...
PDF
Adding Velocity to BigBench, Todor Ivanov, Patrick Bedué, Roberto Zicari, Ahm...
PDF
DataBench - Project fiche
Welcome to DataBench
Session 1 - The Current Landscape of Big Data Benchmarks
Session 2 - A Project Perspective on Big Data Architectural Pipelines and Ben...
Session 3 - The DataBench Framework: A compelling offering to measure the Imp...
Session 4 - A practical journey on how to use the DataBench Toolbox
CoreBigBench: Benchmarking Big Data Core Operations
Success Stories on Big Data & Analytics
DataBench Virtual BenchLearning "Success storie on Big Data & Analytics use c...
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...
DataBench Virtual BenchLearning "Big Data - Benchmark your way to Excellent B...
DataBench session @ BDV Meet-Up Riga: The case of HOBBIT, 27/06/2019
DataBench in a Nutshell - The market: Assessing Industrial Needs, Richard Ste...
Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019
Impacts of data-driven AI in business sectors, Richard Stevens, ICT 2018, 05/...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Building a Bridge between Technical and Business Benchmarking, Gabriella Catt...
Adding Velocity to BigBench, Todor Ivanov, Patrick Bedué, Roberto Zicari, Ahm...
DataBench - Project fiche

Recently uploaded (20)

PPTX
New ISO 27001_2022 standard and the changes
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
Introduction to the R Programming Language
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Global Data and Analytics Market Outlook Report
PDF
Business Analytics and business intelligence.pdf
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
IMPACT OF LANDSLIDE.....................
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
DOCX
Factor Analysis Word Document Presentation
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
New ISO 27001_2022 standard and the changes
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
CYBER SECURITY the Next Warefare Tactics
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Introduction to the R Programming Language
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Global Data and Analytics Market Outlook Report
Business Analytics and business intelligence.pdf
[EN] Industrial Machine Downtime Prediction
IMPACT OF LANDSLIDE.....................
importance of Data-Visualization-in-Data-Science. for mba studnts
STERILIZATION AND DISINFECTION-1.ppthhhbx
Factor Analysis Word Document Presentation
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
retention in jsjsksksksnbsndjddjdnFPD.pptx
Optimise Shopper Experiences with a Strong Data Estate.pdf
Qualitative Qantitative and Mixed Methods.pptx
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx

Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018

  • 1. Big Data Technical Benchmarking Arne J. Berre, SINTEF, Todor Ivanov, Univ. Frankfurt, Tomas Pariente Lobo, Atos BDVe – Databench Webinar, October 9, 2018 11/10/2018 DataBench Project - GA Nr 780966 1
  • 2. Technical Benchmarks in Databench Workflow 2© IDC 2© IDC Technical BenchmarksBusiness Benchmarks
  • 3. Goals&Objectives • The DataBench Toolbox will be a component-based system of both vertical (holistic/business/data type driven) and horizontal (technical area based) big data benchmarks. following the layered architecture provide by the BDVA reference model. Holistic benchmarking approach for big data • It should be able to work or integrate with existing benchmarking initiatives and resources where possible. Not reinventing the wheel, but use wheels to build a new car • The Toolbox will investigate gaps of industrial significance in the big data benchmarking field and contribute to overcome them. Filling gaps • The Toolbox will implement ways to derive as much as possible the DataBench technical metrics and business KPIs from the metrics extracted from the integrated benchmarking. Homogenising metrics • It will include a web-based visualization layer to assist to the final users to specify their benchmarking requirements, such as selected benchmark, data generators, workloads, metrics and the preferred data, volume and velocity, as well as searching and monitoring capabilities. Web user interface
  • 6. 7 23 Domain/Sector/Busi ness solutions KPIs (Manufact, Transport, Energy,.. Business Transport Manufacturing Energy .. Domain X … 22 Standards x x x x x x x x x x MetaData x Graph, Network x x x x x x x x x x x Text, NLP, Web x x x x x x x x x x x x x x x x x x x x x x Image, Audio x x x x Spatio Temp x x Time Series, IoT x x x x x x x Structured, BI x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 18 Visual Analytics x 17 Industrial Analytics (Descriptive, Diagnostic, Predictive, Prescriptive) x x x x 16 Machine Learning, AI, Data Science x x x x x x x x x x x x Streaming/ Realtime Processing x x x x x x x Interactive Processing x x x x x x x x x x x x x x x x x x x x x x x Batch Processing x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x Data Privacy/Security 15 Data Governance/Mgmt x 14 Data Storage x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 19 Communication & Connectivity x x 9 Cloud Services & HPC, Edge x x x x x x TPC-H TPC-DSv1 LinearRoad HadoopWorkloadExamples GridMix PigMix MRBench CALDA HiBench YCSB SWIM CloudRank-D PUMABenchmarkSuite CloudeSuite MRBS AMPLabBigDataBenchmark BigBench BigDataBench LinkBench BigFrame PRIMEBALL SemanticPublishingBenchmark(SPB) SocialNetworkBenchmark StreamBench TPCx-HS SparkBench TPCx-V BigFUN TPC-DSv2 TPCx-BB Graphalytics YahooStreamingBenchmark(YSB) DeepBench DeepMark TensorFlowBenchmarks Fathom AdBench RIoTBench HobbitBenchmark TPCx-HSv2 BigBenchV2 Sanzu Pennmachinelearningbenchmark(PMLB) OpenMLbenchmarksuites Senska DAWNBench/MLPerf IDEBench ABench 1999 2002 2004 2009 2011 2018 BDVA Reference Model 2016 2017 2014 Verticals,incl.DatatypesAnalytics,Processing,DataManagement,Infra Benchmarks 2015 2013 2012 2010 2008 2007 Updating with new Benchmarks
  • 8. Dimensions of Technical Benchmarks 9
  • 9. Summary Category Year Name Type Domain Data Type Micro- benchmarks 2010 HiBench Micro-benchmark Suite Micro-benchmarks, Machine Learning, SQL, Websearch, Graph, Streaming Benchmarks Structured, Text, Web Graph 2015 SparkBench Micro-benchmark Suite Machine Learning, Graph Computation, SQL, Streaming Application Structured, Text, Web Graph 2010 YCSB Micro-benchmark cloud OLTP operations Structured 2017 TPCx-IoT Micro-benchmark workloads on typical IoT Gateway systems Structured, IoT Application Benchmarks 2015 Yahoo Streaming Benchmark Application Streaming Benchmark advertisement analytics pipeline Structured, Time Series 2013 BigBench/TPCx-BB Application End-to-end Benchmark a fictional product retailer platform Structured, Text, JSON logs 2017 BigBench V2 Application End-to-end Benchmark a fictional product retailer platform Structured, Text, JSON logs 2018 ABench (Work-in- Progress) Big Data Architecture Stack Benchmark set of different workloads Structured, Text, JSON logs 10
  • 10. Some of the benchmarks to integrate (I) Year Name Type 2010 HiBench Big data benchmark suite for evaluating different big data frameworks. 19 workloads including synthetic micro-benchmarks and real-world applications from 6 categories which are micro, machine learning, sql, graph, websearch and streaming. 2015 SparkBench System for benchmarking and simulating Spark jobs. Multiple workloads organized in 4 categories. 2010 Yahoo! Cloud System Benchmark (YSCB) Evaluates performance of different “key-value” and “cloud” serving systems, which do not support the ACID properties. The YCSB++ , an extension, includes many additions such as multi-tester coordination for increased load and eventual consistency measurement. 2017 TPCx-IoT Based on YCSB, but with significant changes. Workloads of data ingestion and concurrent queries simulating workloads on typical IoT Gateway systems. Dataset with data from sensors from electric power station(s) 11 Micro-benchmarks:
  • 11. Some of the benchmarks to integrate (II) Year Name Type 2015 Yahoo Streaming Benchmark (YSB) The Yahoo Streaming Benchmark is a streaming application benchmark simulating an advertisement analytics pipeline. 2013 BigBench/TPCx-BB BigBench is an end-to-end, technology agnostic, application-level benchmark that tests the analytical capabilities of a Big Data platform. It is based on a fictional product retailer business model. 2017 BigBench V2 Similar to BigBench, BigBench V2 is an end-to-end, technology agnostic, application-level benchmark that tests the analytical capabilities of a Big Data platform 2018 ABench (Work-in- Progress) New type of multi-purpose Big Data benchmark covering many big data scenarios and implementations. Extends other benchmarks such as BigBench 12 Application-oriented benchmarks:
  • 12.  The BigBench specification comprises two key components:  a data model specification  a workload/query specification.  The structured part of the BigBench data model is adopted from the TPC-DS data model  The data model specification is implemented by a data generator, which is based on an extension of PDGF.  BigBench 1.0 workload specification consists of 30 queries/workloads (10 structured from TPC-DS, and 20 adapted from a McKinsey report on Big Data use cases and opportunities).  BigBench 2.0 … The BigBench data model The BigBench 2.0 overview Rabi T., et al. The Vision of BigBench 2.0, 2016. Proceedings of the Fourth Workshop on Data analytics in the Cloud. Article No. 3, http://blog.cloudera.com/blog/2014/11/bigbench-toward-an-industry-standard-benchmark-for-big-data-analytics/
  • 14.  Benchmark any step of the Linked Data lifecycle  Ensure that benchmarking results can be found, accessed, integrated and reused easily (FAIR principles)  Benchmark Big Data platforms by being the first distributed benchmarking platform for Linked data.  The Hobbit platform comprises several components:  Single components are implemented as independent containers.  Communication between these components is done via a message bus.  Everything is dockerized, from the benchmarked system to all the components Principles: • Users can test systems with the HOBBIT benchmarks without having to worry about finding standardized hardware • New benchmarks can be easily created and added to the platform by third parties. • The evaluation can be scaled out to large datasets and on distributed architectures. • The publishing and analysis of the results of different systems can be carried out in a uniform manner across the different benchmarks.
  • 15. Summary • DataBench: • A framework for big data benchmarking for PPP projects and big data practitioners • We will provide methodology and tools • Added value: • An umbrella to access to multiple benchmarks • Homogenized technical metrics • Derived business KPIs, • A community around • PPP projects, industrial partners (BDVA and beyond) and benchmarking initiatives are welcomed to work with us, either to use our framework or to add new benchmarks
  • 16. Big Data Benchmark session at EBDVF'2018 11/10/2018 DataBench Project - GA Nr 780966 17 Monday November 12th, 1700 – 1830,EBDVF'2018, Vienna

Editor's Notes

  • #2: Gabriella Cattaneo (IDC) will provide ideas on how big data benchmarking could help organizations to get better business insights and take informed decision
  • #5: The approach is as follows: The DataBench Toolbox allows to reuse exisiting and new benchmarks in a uniform way, collects and calculate coherent metrics that make them comparable This Toolbox will be used by different stakeholders (BDV PPP projects, industries, etc.) to benchmark and compare with others The metrics will be them derived and transformed into a set of coherent business KPIs of industrial significance
  • #15: I miss the Business KPÎs formulas from WP2 to WP3. WP3 will implement the transformation, but the design should come form WP2. I miss also a functional overview of the whole process independent from the WP view (more conceptual)