Big Data Europe Integrator Platform
Empowering Communities with Data Technologies
Technical Contributions
BDE Webinar - 27 April
Dr. Hajira Jabeen
Senior researcher
University of Bonn
Platform Goals
◎Opensource
◎Ease of Use
◎Support a variety of use cases
◎Embrace emerging Big Data Technologies
◎Simple integration with custom components
Key actors
Platform Architecture
4
5
Platform Architecture
Platform Architecture
6
Platform Architecture
Support Layer
Init Daemon
GUIs
Monitor
App Layer
Traffic
Forecast
Satellite Image Analysis
Platform Layer
Spark Flink Semantic Layer
Ontario SANSA Semagrow
Kafka
Real-time Stream Monitoring
...
...
Resource Management Layer (Swarm)
Hardware Layer
Premises Cloud (AWS, GCE, MS Azure, …)
Data Layer
Hadoop NOSQL Store CassandraElasticsearch ...RDF Store
BDE Supported Frameworks
Search/indexing Data processing
Apache Solr Apache Spark
Data acquisition Apache Flink
Apache Flume Semantic Components
Message passing Strabon
Apache Kafka Sextant
Data storage GeoTriples
Hue Silk
Apache Cassandra SEMAGROW
ScyllaDB LIMES
Apache Hive 4Store
Postgis OpenLink Virtuoso
8
Platform features
◎ BDE Development Environment
o Stack builder
o Workflow builder
o Instructions to add custom components to the BDE
stack
◎ Administrator Interface
o SwarmUI
◎ UI Integrator
o Workflow monitor
o Integrated web interface
9
Platform installation
◎Manual installation guide
◎Using Docker Machine
o On local machine (VirtualBox)
o In cloud (AWS, DigitalOcean, Azure)
o Bare metal
◎Screencasts
10
Deploying a Big Data Stack
◎ Stack
o collection of communicating components
o to solve a specific problem
◎ Described in Docker Compose
o Component configuration
o Application topology
11
Enhancing the Component
◎ Orchestrator required for initialization process
(init_daemon)
o Components may depend on each other
o Components may require manual intervention
◎ User Interface Integration
o Standard Interfaces from components
o Combine and align the interfaces
12
User Interfaces
◎Target: Facilitate use of the platform
o User Interface Adaption
◎Available interfaces
o Workflow UIs
❖ Workflow Builder
❖ Workflow Monitor
o Swarm UI
o Integrator UI
13
BDE Workflow Builder
14
Component 1
Component 2
Component 3
BDE Workflow Monitor
15
Component 1
Finished
Component 2
Finished
Component 3
Inprogress
Swarm UI
Increase number
of instances
Integrator UI
17
Component 1 Component 2
Beyond the state of the art ...
Smart Big Data
Increase the value of Big Data
by adding meaning to it!
18
Semantic Data Lake (Ontario)
◎Data Swamp
o Repository of data in its raw format
o Structured, semi-structured, unstructured
o Schema-less
◎Data Lake
o Add a Semantic layer on top of the source
datasets
o The data is semantically lifted using existing
19
BDE-BDVA Webinar: BDE Technical Overview
21
SANSA Stack
Find Big Data Europe at :
https://github.com/big-data-europe
22
jabeen@iai.uni-bonn.de
23
BigDataEurope & BDVA: Synergies
24
BDE vs Hadoop distributions
Hortonworks Cloudera MapR Bigtop BDE
File System HDFS HDFS NFS HDFS HDFS
Installation Native Native Native Native lightweight
virtualization
Plug & play components (no
rigid schema)
no no no no yes
High Availability Single failure
recovery (yarn)
Single failure
recovery (yarn)
Self healing, mult.
failure rec.
Single failure
recovery (yarn)
Multiple Failure
recovery
Cost Commercial Commercial Commercial Free Free
Scaling Freemium Freemium Freemium Free Free
Addition of custom
components
Not easy No No No Yes
Integration testing yes yes yes yes --
Operating systems Linux Linux Linux Linux All
Management tool Ambari Cloudera manager MapR Control
system
- Docker swarm UI+
Custom
25
BDE vs Hadoop distributions
◎BDE is not built on top of existing distributions
◎Targets
o Communities
o Research institutions
◎Bridges scientists and open data
◎Multi Tier research efforts towards Smart
Data
26

More Related Content

PPTX
Release webinar: Sansa and Ontario
PPTX
Societal Challenge 6: Social Sciences - Spending Comparison
PPTX
Platform introduction & Summary
PPTX
Release webinar architecture
PPTX
Release webinar end users
PPTX
SC1 Workshop 2 Technical overview
PPT
SC5 Hangout2 pilot 1 description
PDF
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...
Release webinar: Sansa and Ontario
Societal Challenge 6: Social Sciences - Spending Comparison
Platform introduction & Summary
Release webinar architecture
Release webinar end users
SC1 Workshop 2 Technical overview
SC5 Hangout2 pilot 1 description
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...

What's hot (20)

PPTX
Big Data Europe Transport Pilot case, Luigi Selmi
PDF
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
PPTX
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
PDF
Big Data Europe: Simplifying Development and Deployment of Big Data Applications
PPTX
SC1 Workshop 2 General Introduction to BDE
PDF
BigDataEurope @BDVA Summit2016 2: Societal Pilots
PDF
BDE-BDVA Webinar: BigDataEurope Overview & Synergies with BDVA
PPTX
SC4 Hangout - Luigi Selmi, Transport pilot architecture
PPTX
BDE_SC4_WS3_6_Luigi Selmi - Pilot SC4
PDF
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
PPTX
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
PPTX
Updates from Hungary (Jozsef Kovacs)
PPTX
Open DMPs: Machine Actionable open data management planning (Presentation at ...
PDF
BDE SC3.3 Workshop - BDE Platform: Technical overview
PDF
Red hat infrastructure for analytics
PPTX
SC7 Webinar 4 04/05/2017 NCSR Demokritos Presentation "Event Detection"
PPTX
BDE SC6-hang out - technology part-SWC - Martin
PDF
SFScon21 - Paolo Viskanic - Marta Pederneschi - The new WebGIS of the BBT com...
PPTX
Py datanyc2015
PDF
SC7 Webinar 4 04/05/2017 SatCen Presentation "The Secure Societies Community ...
Big Data Europe Transport Pilot case, Luigi Selmi
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Big Data Europe: Simplifying Development and Deployment of Big Data Applications
SC1 Workshop 2 General Introduction to BDE
BigDataEurope @BDVA Summit2016 2: Societal Pilots
BDE-BDVA Webinar: BigDataEurope Overview & Synergies with BDVA
SC4 Hangout - Luigi Selmi, Transport pilot architecture
BDE_SC4_WS3_6_Luigi Selmi - Pilot SC4
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Updates from Hungary (Jozsef Kovacs)
Open DMPs: Machine Actionable open data management planning (Presentation at ...
BDE SC3.3 Workshop - BDE Platform: Technical overview
Red hat infrastructure for analytics
SC7 Webinar 4 04/05/2017 NCSR Demokritos Presentation "Event Detection"
BDE SC6-hang out - technology part-SWC - Martin
SFScon21 - Paolo Viskanic - Marta Pederneschi - The new WebGIS of the BBT com...
Py datanyc2015
SC7 Webinar 4 04/05/2017 SatCen Presentation "The Secure Societies Community ...
Ad

Similar to BDE-BDVA Webinar: BDE Technical Overview (20)

PDF
BigDataEurope @BDVA Summit2016 1: The BDE Platform
PDF
The Never Landing Stream with HTAP and Streaming
PDF
28March2024-Codeless-Generative-AI-Pipelines
PPTX
.NET per la Data Science e oltre
PDF
【BS1】What’s new in visual studio 2022 and c# 10
PPT
Ultralight Data Movement for IoT with SDC Edge
PDF
Hambug R Meetup - Intro to H2O
PDF
Apache Arrow at DataEngConf Barcelona 2018
PDF
cncf overview and building edge computing using kubernetes
PPTX
DICE & Cloudify – Quality Big Data Made Easy
PDF
Cloud-Native .Net des applications containerisées .Net sur Linux, Windows e...
PDF
How Service Mesh Fits into the Modern Data Stack
PDF
Despliegue Cloud-Native Simplificado: Infraestructura, Servicios y GenAI en m...
PDF
H2O at BelgradeR Meetup
PDF
Belgrade R - Intro to H2O and Deep Water
PDF
Apache Kafka® and the Data Mesh
PPTX
SLUGUK BUILD Round-up
PPT
Course Notes-Unit 5.ppt
PPTX
6 Ways to Get More From Your Azure
PPTX
6 Ways to Get More From Your Azure
BigDataEurope @BDVA Summit2016 1: The BDE Platform
The Never Landing Stream with HTAP and Streaming
28March2024-Codeless-Generative-AI-Pipelines
.NET per la Data Science e oltre
【BS1】What’s new in visual studio 2022 and c# 10
Ultralight Data Movement for IoT with SDC Edge
Hambug R Meetup - Intro to H2O
Apache Arrow at DataEngConf Barcelona 2018
cncf overview and building edge computing using kubernetes
DICE & Cloudify – Quality Big Data Made Easy
Cloud-Native .Net des applications containerisées .Net sur Linux, Windows e...
How Service Mesh Fits into the Modern Data Stack
Despliegue Cloud-Native Simplificado: Infraestructura, Servicios y GenAI en m...
H2O at BelgradeR Meetup
Belgrade R - Intro to H2O and Deep Water
Apache Kafka® and the Data Mesh
SLUGUK BUILD Round-up
Course Notes-Unit 5.ppt
6 Ways to Get More From Your Azure
6 Ways to Get More From Your Azure
Ad

More from BigData_Europe (20)

PDF
Luigi Selmi - The Big Data Integrator Platform
PDF
Josep Maria Salanova - Introduction to BDE+SC4
PDF
Rajendra Akerkar - LeMO Project
PDF
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
PDF
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
PDF
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
PDF
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
PDF
BDE SC3.3 Workshop - BDE review: Scope and Opportunities
PDF
BDE SC3.3 Workshop - Agenda
PDF
BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...
PDF
BDE SC3.3 Workshop - Data management in WT testing and monitoring
PDF
BDE SC3.3 Workshop - Big Data in Wind Turbine Condition Monitoring
PDF
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...
PDF
BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics
PDF
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
PDF
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)
PPTX
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
PPTX
BDE SC1 Workshop 3 - MIDAS (Michaela Black)
PPTX
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
PPTX
BDE SC1 Workshop 3 - Big Data Europe (Simon Scerri)
Luigi Selmi - The Big Data Integrator Platform
Josep Maria Salanova - Introduction to BDE+SC4
Rajendra Akerkar - LeMO Project
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
BDE SC3.3 Workshop - BDE review: Scope and Opportunities
BDE SC3.3 Workshop - Agenda
BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...
BDE SC3.3 Workshop - Data management in WT testing and monitoring
BDE SC3.3 Workshop - Big Data in Wind Turbine Condition Monitoring
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...
BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
BDE SC1 Workshop 3 - MIDAS (Michaela Black)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Big Data Europe (Simon Scerri)

Recently uploaded (20)

PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PPTX
Managing Community Partner Relationships
PDF
[EN] Industrial Machine Downtime Prediction
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PPT
Image processing and pattern recognition 2.ppt
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
DOCX
Factor Analysis Word Document Presentation
PPT
Predictive modeling basics in data cleaning process
PDF
Navigating the Thai Supplements Landscape.pdf
PPTX
modul_python (1).pptx for professional and student
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PDF
Data Engineering Interview Questions & Answers Data Modeling (3NF, Star, Vaul...
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PPTX
IMPACT OF LANDSLIDE.....................
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Managing Community Partner Relationships
[EN] Industrial Machine Downtime Prediction
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Image processing and pattern recognition 2.ppt
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
Factor Analysis Word Document Presentation
Predictive modeling basics in data cleaning process
Navigating the Thai Supplements Landscape.pdf
modul_python (1).pptx for professional and student
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
Data Engineering Interview Questions & Answers Data Modeling (3NF, Star, Vaul...
Topic 5 Presentation 5 Lesson 5 Corporate Fin
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
IMPACT OF LANDSLIDE.....................
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx

BDE-BDVA Webinar: BDE Technical Overview

  • 1. Big Data Europe Integrator Platform Empowering Communities with Data Technologies Technical Contributions BDE Webinar - 27 April Dr. Hajira Jabeen Senior researcher University of Bonn
  • 2. Platform Goals ◎Opensource ◎Ease of Use ◎Support a variety of use cases ◎Embrace emerging Big Data Technologies ◎Simple integration with custom components
  • 7. Platform Architecture Support Layer Init Daemon GUIs Monitor App Layer Traffic Forecast Satellite Image Analysis Platform Layer Spark Flink Semantic Layer Ontario SANSA Semagrow Kafka Real-time Stream Monitoring ... ... Resource Management Layer (Swarm) Hardware Layer Premises Cloud (AWS, GCE, MS Azure, …) Data Layer Hadoop NOSQL Store CassandraElasticsearch ...RDF Store
  • 8. BDE Supported Frameworks Search/indexing Data processing Apache Solr Apache Spark Data acquisition Apache Flink Apache Flume Semantic Components Message passing Strabon Apache Kafka Sextant Data storage GeoTriples Hue Silk Apache Cassandra SEMAGROW ScyllaDB LIMES Apache Hive 4Store Postgis OpenLink Virtuoso 8
  • 9. Platform features ◎ BDE Development Environment o Stack builder o Workflow builder o Instructions to add custom components to the BDE stack ◎ Administrator Interface o SwarmUI ◎ UI Integrator o Workflow monitor o Integrated web interface 9
  • 10. Platform installation ◎Manual installation guide ◎Using Docker Machine o On local machine (VirtualBox) o In cloud (AWS, DigitalOcean, Azure) o Bare metal ◎Screencasts 10
  • 11. Deploying a Big Data Stack ◎ Stack o collection of communicating components o to solve a specific problem ◎ Described in Docker Compose o Component configuration o Application topology 11
  • 12. Enhancing the Component ◎ Orchestrator required for initialization process (init_daemon) o Components may depend on each other o Components may require manual intervention ◎ User Interface Integration o Standard Interfaces from components o Combine and align the interfaces 12
  • 13. User Interfaces ◎Target: Facilitate use of the platform o User Interface Adaption ◎Available interfaces o Workflow UIs ❖ Workflow Builder ❖ Workflow Monitor o Swarm UI o Integrator UI 13
  • 14. BDE Workflow Builder 14 Component 1 Component 2 Component 3
  • 15. BDE Workflow Monitor 15 Component 1 Finished Component 2 Finished Component 3 Inprogress
  • 18. Beyond the state of the art ... Smart Big Data Increase the value of Big Data by adding meaning to it! 18
  • 19. Semantic Data Lake (Ontario) ◎Data Swamp o Repository of data in its raw format o Structured, semi-structured, unstructured o Schema-less ◎Data Lake o Add a Semantic layer on top of the source datasets o The data is semantically lifted using existing 19
  • 22. Find Big Data Europe at : https://github.com/big-data-europe 22 jabeen@iai.uni-bonn.de
  • 23. 23
  • 24. BigDataEurope & BDVA: Synergies 24
  • 25. BDE vs Hadoop distributions Hortonworks Cloudera MapR Bigtop BDE File System HDFS HDFS NFS HDFS HDFS Installation Native Native Native Native lightweight virtualization Plug & play components (no rigid schema) no no no no yes High Availability Single failure recovery (yarn) Single failure recovery (yarn) Self healing, mult. failure rec. Single failure recovery (yarn) Multiple Failure recovery Cost Commercial Commercial Commercial Free Free Scaling Freemium Freemium Freemium Free Free Addition of custom components Not easy No No No Yes Integration testing yes yes yes yes -- Operating systems Linux Linux Linux Linux All Management tool Ambari Cloudera manager MapR Control system - Docker swarm UI+ Custom 25
  • 26. BDE vs Hadoop distributions ◎BDE is not built on top of existing distributions ◎Targets o Communities o Research institutions ◎Bridges scientists and open data ◎Multi Tier research efforts towards Smart Data 26