SlideShare a Scribd company logo
Multi-Tenant Hadoop-as-a-Service (for free!)
Jim Dowling
Associate Prof @ KTH
Senior Researcher @ SICS
CEO @ Hops AB
SHUG Meetup, Stockholm, April 21st 2016
www.hops.io
@hopshadoop
(Some Slides by Prof. Tor Björn Minde, CEO SICS North Swedish ICT AB)
Shug meetup Hops Hadoop
Talk Overview
•World’s First Open Data Centre for Big Data in Luleå
•Metadata in Hadoop
•True Multi-Tenancy for Hadoop
•DEMO: Spark/Flink/Hadoop-as-a-Service
3
Vision SICS ICE research facility
4
A 2 MW datacenter research and test environment
Purpose: Increase knowledge, strengthen universities, companies and researchers
R&D institute, 5 lab modules, 3-4000 servers, 2-3000 square meters
What SICS ICE will offer
1. Compute capacity and tools for big data and cloud
• Hadoop/Spark/Flink-as-a-Service
2. Demonstration space for new products & solutions
3. Datacenter infrastructure for experiments and facility data
• Flexible lab modules and re-configuration
• Measurement equipment for energy, cooling, capacity
4. Competence for verticals and datacenter infrastructure
5
Status of SICS-ICE research facility
(ICE = Infrastructure and Cloud research Environment)
Phase 1 (1 room built)
• Establish test projects in a “room-in-
room” commercial co-location facility
• Start of operation February 2016
• Officially Launched in April 2016
Phase 2 (Design phase)
• Design of a flexible and general research
facility summer-fall 2016
• Contracts with Akademiska Hus & E.ON
• Plan is to start build phase Spring 2017
• Plan is to start installation fall 2017
• Plan is to start operation early 2018
6
SICS-ICE Phase1
Phase 1 room-in-room module 1
7
A Data Center Optimized for Hadoop
8
Dell servers from Hi5 in module 1
• 3600 cores
• 40 TB RAM
• Up to 7.5 petabyte storage
• 10/40 Gb/s network
• Separate management network
Hadoop-as-a-Service on SICS ICE
9
But First…. MetaData in Hadoop
10
Metadata Totem Poles in Hadoop
11Eventual Consistency
With Many Hadoop Clusters
12
Cluster 1 Cluster N
MetaData
Service
MetaData
Service
MetaData Service (Aggregator)
Eventually consistent MetaData aggregated using more
eventually consistent protocols.
MetaData in Hops Hadoop
HDFS
YARN
NDB
Projects
DataSets
Users
ProvenanceSearch
HistoryCustomMetaData
13
Case Study: Access Control as a MetaData Service
14
Access Control in Relational Databases
# Multi-tenancy for alice and bob on db1 and db2
grant all privileges on db1.* to ‘alice'@‘%‘;
grant all privileges on db2.* to ‘bob'@‘%‘;
#More fine-grained privileges
grant SELECT privileges on db2.sensitiveTable
to ‘alice'@‘192.168.1.2‘;
15
Databases ensure the consistency of security and policies using foreign keys.
“drop table db2.sensitiveTable” => delete associated privileges
Access Control in Hadoop: Apache Sentry
16
How do you ensure the consistency of the policies and the data?
[Mujumdar’15]
Policy Editor for Sentry
17
Administrators administer privileges for users
Problem: Sensitive Data needs its own Cluster
18
NSA DataSet
User DataSet
Alice can copy/cross-link between data sets
Alice has only one Kerberos Identity.
Neither attribute-based access control nor dynamic roles supported in Hadoop.
Alice
Solution: Project-Specific UserIDs
19
Project NSA
Project Users
Member of
NSA__Alice
Users__Alice
Member of
HDFS enforces
access control
How can we share DataSets between Projects?
Sharing DataSets with HopsWorks
20
Project NSA
Project Users
Member of
DataSetowns
Add members of Project
NSA to the DataSet group
NSA__Alice
Users__Alice
Member of
Web Application Enforces Dynamic Roles
21
Alice@gmail.com
NSA__Alice
Authenticate
Users__Alice
HopsWorks
HopsFS
HopsYARN
Projects
Secure
Impersonation
User
•Authentication Provider
- JDBC Realm
- 2-Factor Authentication
- LDAP
22
Project
•Users
- Roles: Owner, Data Scientist
•DataSets
- Home project
- Can be shared
23
Project Roles
•Data Owner Privileges
- Import/Export data
- Manage Membership
- Share DataSets
•Data Scientist Privileges
- Write code
- Run code
- Request access to DataSets
24
We delegate administration of privileges to users
Per Project CPU and Storage Quotas
•300 GB per Project
•1000 CPU mins
•Uber-Style Pricing
- Elastic Demand Curve
25
Sharing DataSets between Projects
27
The same as Sharing Folders in Dropbox
Delegate Access Control to HDFS
•HDFS enforces access
control
- UserID per Project
- GroupID per
Project and DataSet
•Metadata Integrity
using Foreign Keys
- Removing a project removes
all users, groups, extended
metadata, and (optionally)
DataSets.
28
Free Text Search with Consistent Metadata
29
Free-Text
Search
Distributed
Database
ElasticSearch
The Distributed Database is the Single Source of Truth.
Foreign keys ensure the integrity of Metadata.
MetaData
Designer
MetaData
Entry
The NoteBook Proxy Wars
30
Demo
31
Short-Term RoadMap
•Multi-tenant Kafka
- Per-project Topics
•Oozie Workflow Editor
•Genomics Support with Adam/Spark
•Tiered Storage: Hot Data, Normal, Archived
•Improved Data Ingress
- Sharing Public DataSets Globally using P2P technology
32
The Team
Active: Jim Dowling, Seif Haridi, Tor Björn Minde,
Gautier Berthou, Salman Niazi, Mahmoud Ismail,
Kamal Hakimzadeh, Ermias Gebremeskel,
Theofilos Kakantousis, Johan Svedlund Nordström,
Someya Sayeh, Vasileios Giannokostas,
Antonios Kouzoupis, Misganu Dessalegn, Rizvi Hasan,
Ahmad Al-Shishtawy, Ali Gholami, Paul Mälzer.
Alumni: K. “Sri” Srijeyanthan, Steffen Grohsschmiedt,
Alberto Lorente, Andre Moré,
Stig Viaene, Hooman Peiro, Evangelos Savvidis,
Jude D’Souza, Qi Qi, Gayana Chandrasekara,
Nikolaos Stanogias, Daniel Bali, Ioannis Kerkinos,
Peter Buechler, Pushparaj Motamari, Hamid Afzali,
Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu.
Hops
Conclusions
•HopsWork is providing a world’s first: Hadoop-as-a-
Service to researchers and industry.
•Workshop on 12th May, 17.30 – 20.00 in SICS,
6th Floor of the Electrum Building, Kista.
Register at www.hops.io/?q=news
•Join the team – talk to me!
34
www.hops.io
www.hops.site

More Related Content

PPTX
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
PPTX
Case study on big data
PPTX
Backup multi-cloud solution based on named pipes
PDF
Sharing resources with non-Hadoop workloads
PPTX
Hybrid Data Platform
PDF
The hadoop ecosystem table
PPTX
سکوهای ابری و مدل های برنامه نویسی در ابر
PDF
Welcome to Hadoop2Land!
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Case study on big data
Backup multi-cloud solution based on named pipes
Sharing resources with non-Hadoop workloads
Hybrid Data Platform
The hadoop ecosystem table
سکوهای ابری و مدل های برنامه نویسی در ابر
Welcome to Hadoop2Land!

What's hot (20)

PPTX
HPC and cloud distributed computing, as a journey
PDF
Splunk: Druid on Kubernetes with Druid-operator
PDF
HPE Hadoop Solutions - From use cases to proposal
PPTX
Introducing Cloudian HyperStore 6.0
PDF
Hadoop meets Agile! - An Agile Big Data Model
PPTX
Big Data Platform Industrialization
PPT
PPTX
Hadoop Technology
PDF
Solution Brief: Big Data Lab Accelerator
PPTX
To The Cloud and Back: A Look At Hybrid Analytics
PPTX
Hadoop in the Cloud – The What, Why and How from the Experts
PDF
Achieving Separation of Compute and Storage in a Cloud World
PPTX
Keep your Hadoop Cluster at its Best
PPT
The Time Has Come for Big-Data-as-a-Service
PPTX
Hadoop Infrastructure @Uber Past, Present and Future
PPTX
IoT:what about data storage?
PPTX
A New "Sparkitecture" for modernizing your data warehouse
PPTX
Build Big Data Enterprise solutions faster on Azure HDInsight
PPTX
Hadoop in the Cloud - The what, why and how from the experts
PPTX
HPC and cloud distributed computing, as a journey
Splunk: Druid on Kubernetes with Druid-operator
HPE Hadoop Solutions - From use cases to proposal
Introducing Cloudian HyperStore 6.0
Hadoop meets Agile! - An Agile Big Data Model
Big Data Platform Industrialization
Hadoop Technology
Solution Brief: Big Data Lab Accelerator
To The Cloud and Back: A Look At Hybrid Analytics
Hadoop in the Cloud – The What, Why and How from the Experts
Achieving Separation of Compute and Storage in a Cloud World
Keep your Hadoop Cluster at its Best
The Time Has Come for Big-Data-as-a-Service
Hadoop Infrastructure @Uber Past, Present and Future
IoT:what about data storage?
A New "Sparkitecture" for modernizing your data warehouse
Build Big Data Enterprise solutions faster on Azure HDInsight
Hadoop in the Cloud - The what, why and how from the experts
Ad

Viewers also liked (20)

PDF
Spark summit-east-dowling-feb2017-full
PPTX
Haya Exports
PPS
Nueva droga alerta-6135
PDF
Modul html
PDF
2008 photo identification facial metrical and morphological features in south...
PDF
Monografia bioestadisstica
PPTX
News reports
PPS
Butterfly
DOC
Wcm remedies concl
PPT
Unha pequena escolma
PPTX
【模擬選挙×マニフェストスイッチ】開票結果について
PPTX
【クラーク高校】模擬選挙授業用資料
PPT
Billy Elliot Transcript
PPTX
Retos de la ciencia para el siglo xxi
DOCX
Ensayo de oratoria. alexander
PPT
Negociere.curs 9
PDF
kelas11 smk-biologi-pertanian_ameilia-dkk
PPTX
Retos de la ciencia para el siglo XXI
PDF
Smart Social Media - Arabic version
PPTX
49075554 forensic-odontology-dr-rizwan
Spark summit-east-dowling-feb2017-full
Haya Exports
Nueva droga alerta-6135
Modul html
2008 photo identification facial metrical and morphological features in south...
Monografia bioestadisstica
News reports
Butterfly
Wcm remedies concl
Unha pequena escolma
【模擬選挙×マニフェストスイッチ】開票結果について
【クラーク高校】模擬選挙授業用資料
Billy Elliot Transcript
Retos de la ciencia para el siglo xxi
Ensayo de oratoria. alexander
Negociere.curs 9
kelas11 smk-biologi-pertanian_ameilia-dkk
Retos de la ciencia para el siglo XXI
Smart Social Media - Arabic version
49075554 forensic-odontology-dr-rizwan
Ad

Similar to Shug meetup Hops Hadoop (20)

PPTX
Strata Hadoop Hopsworks
PDF
Data Science with the Help of Metadata
PPTX
Hops - Distributed metadata for Hadoop
PDF
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
PPTX
Polyglot metadata for Hadoop
PDF
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
PDF
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
PPTX
Big data application using hadoop in cloud [Smart Refrigerator]
PPTX
Big dataarchitecturesandecosystem+nosql
PPTX
DEVNET-1166 Open SDN Controller APIs
PPT
Hadoop online-training
PDF
Beyond Hadoop and MapReduce
PPT
Big Data As a service - Sethuonline.com | Sathyabama University Chennai
PDF
Big data and cloud computing 9 sep-2017
PPTX
Hadoop info
PDF
Hadoop at datasift
PPTX
Research on vector spatial data storage scheme based
PPTX
Next Big Thing In IT Space
PDF
Data As Service (Team: 5, Project: 17)
PPTX
My Other Computer is a Data Center: The Sector Perspective on Big Data
Strata Hadoop Hopsworks
Data Science with the Help of Metadata
Hops - Distributed metadata for Hadoop
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Polyglot metadata for Hadoop
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Big data application using hadoop in cloud [Smart Refrigerator]
Big dataarchitecturesandecosystem+nosql
DEVNET-1166 Open SDN Controller APIs
Hadoop online-training
Beyond Hadoop and MapReduce
Big Data As a service - Sethuonline.com | Sathyabama University Chennai
Big data and cloud computing 9 sep-2017
Hadoop info
Hadoop at datasift
Research on vector spatial data storage scheme based
Next Big Thing In IT Space
Data As Service (Team: 5, Project: 17)
My Other Computer is a Data Center: The Sector Perspective on Big Data

More from Jim Dowling (20)

PDF
ARVC and flecainide case report[EI] Jim.docx.pdf
PDF
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PDF
Serverless ML Workshop with Hopsworks at PyData Seattle
PDF
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PDF
_Python Ireland Meetup - Serverless ML - Dowling.pdf
PDF
Building Hopsworks, a cloud-native managed feature store for machine learning
PDF
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
PDF
Ml ops and the feature store with hopsworks, DC Data Science Meetup
PDF
Hops fs huawei internal conference july 2021
PDF
Hopsworks MLOps World talk june 21
PDF
Hopsworks Feature Store 2.0 a new paradigm
PDF
Metadata and Provenance for ML Pipelines with Hopsworks
PDF
GANs for Anti Money Laundering
PDF
Berlin buzzwords 2020-feature-store-dowling
PDF
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
PDF
Hopsworks data engineering melbourne april 2020
PDF
The Bitter Lesson of ML Pipelines
PDF
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
PDF
Hopsworks at Google AI Huddle, Sunnyvale
PDF
Hopsworks in the cloud Berlin Buzzwords 2019
ARVC and flecainide case report[EI] Jim.docx.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdf
Serverless ML Workshop with Hopsworks at PyData Seattle
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf
Building Hopsworks, a cloud-native managed feature store for machine learning
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Hops fs huawei internal conference july 2021
Hopsworks MLOps World talk june 21
Hopsworks Feature Store 2.0 a new paradigm
Metadata and Provenance for ML Pipelines with Hopsworks
GANs for Anti Money Laundering
Berlin buzzwords 2020-feature-store-dowling
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Hopsworks data engineering melbourne april 2020
The Bitter Lesson of ML Pipelines
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks in the cloud Berlin Buzzwords 2019

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPT
Teaching material agriculture food technology
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Advanced IT Governance
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
Big Data Technologies - Introduction.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Cloud computing and distributed systems.
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Machine learning based COVID-19 study performance prediction
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
Per capita expenditure prediction using model stacking based on satellite ima...
Teaching material agriculture food technology
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Network Security Unit 5.pdf for BCA BBA.
Advanced IT Governance
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Big Data Technologies - Introduction.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
GamePlan Trading System Review: Professional Trader's Honest Take
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Cloud computing and distributed systems.
Mobile App Security Testing_ A Comprehensive Guide.pdf
Approach and Philosophy of On baking technology
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Machine learning based COVID-19 study performance prediction
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
cuic standard and advanced reporting.pdf
NewMind AI Weekly Chronicles - August'25 Week I

Shug meetup Hops Hadoop

  • 1. Multi-Tenant Hadoop-as-a-Service (for free!) Jim Dowling Associate Prof @ KTH Senior Researcher @ SICS CEO @ Hops AB SHUG Meetup, Stockholm, April 21st 2016 www.hops.io @hopshadoop (Some Slides by Prof. Tor Björn Minde, CEO SICS North Swedish ICT AB)
  • 3. Talk Overview •World’s First Open Data Centre for Big Data in Luleå •Metadata in Hadoop •True Multi-Tenancy for Hadoop •DEMO: Spark/Flink/Hadoop-as-a-Service 3
  • 4. Vision SICS ICE research facility 4 A 2 MW datacenter research and test environment Purpose: Increase knowledge, strengthen universities, companies and researchers R&D institute, 5 lab modules, 3-4000 servers, 2-3000 square meters
  • 5. What SICS ICE will offer 1. Compute capacity and tools for big data and cloud • Hadoop/Spark/Flink-as-a-Service 2. Demonstration space for new products & solutions 3. Datacenter infrastructure for experiments and facility data • Flexible lab modules and re-configuration • Measurement equipment for energy, cooling, capacity 4. Competence for verticals and datacenter infrastructure 5
  • 6. Status of SICS-ICE research facility (ICE = Infrastructure and Cloud research Environment) Phase 1 (1 room built) • Establish test projects in a “room-in- room” commercial co-location facility • Start of operation February 2016 • Officially Launched in April 2016 Phase 2 (Design phase) • Design of a flexible and general research facility summer-fall 2016 • Contracts with Akademiska Hus & E.ON • Plan is to start build phase Spring 2017 • Plan is to start installation fall 2017 • Plan is to start operation early 2018 6 SICS-ICE Phase1
  • 7. Phase 1 room-in-room module 1 7
  • 8. A Data Center Optimized for Hadoop 8 Dell servers from Hi5 in module 1 • 3600 cores • 40 TB RAM • Up to 7.5 petabyte storage • 10/40 Gb/s network • Separate management network
  • 10. But First…. MetaData in Hadoop 10
  • 11. Metadata Totem Poles in Hadoop 11Eventual Consistency
  • 12. With Many Hadoop Clusters 12 Cluster 1 Cluster N MetaData Service MetaData Service MetaData Service (Aggregator) Eventually consistent MetaData aggregated using more eventually consistent protocols.
  • 13. MetaData in Hops Hadoop HDFS YARN NDB Projects DataSets Users ProvenanceSearch HistoryCustomMetaData 13
  • 14. Case Study: Access Control as a MetaData Service 14
  • 15. Access Control in Relational Databases # Multi-tenancy for alice and bob on db1 and db2 grant all privileges on db1.* to ‘alice'@‘%‘; grant all privileges on db2.* to ‘bob'@‘%‘; #More fine-grained privileges grant SELECT privileges on db2.sensitiveTable to ‘alice'@‘192.168.1.2‘; 15 Databases ensure the consistency of security and policies using foreign keys. “drop table db2.sensitiveTable” => delete associated privileges
  • 16. Access Control in Hadoop: Apache Sentry 16 How do you ensure the consistency of the policies and the data? [Mujumdar’15]
  • 17. Policy Editor for Sentry 17 Administrators administer privileges for users
  • 18. Problem: Sensitive Data needs its own Cluster 18 NSA DataSet User DataSet Alice can copy/cross-link between data sets Alice has only one Kerberos Identity. Neither attribute-based access control nor dynamic roles supported in Hadoop. Alice
  • 19. Solution: Project-Specific UserIDs 19 Project NSA Project Users Member of NSA__Alice Users__Alice Member of HDFS enforces access control How can we share DataSets between Projects?
  • 20. Sharing DataSets with HopsWorks 20 Project NSA Project Users Member of DataSetowns Add members of Project NSA to the DataSet group NSA__Alice Users__Alice Member of
  • 21. Web Application Enforces Dynamic Roles 21 Alice@gmail.com NSA__Alice Authenticate Users__Alice HopsWorks HopsFS HopsYARN Projects Secure Impersonation
  • 22. User •Authentication Provider - JDBC Realm - 2-Factor Authentication - LDAP 22
  • 23. Project •Users - Roles: Owner, Data Scientist •DataSets - Home project - Can be shared 23
  • 24. Project Roles •Data Owner Privileges - Import/Export data - Manage Membership - Share DataSets •Data Scientist Privileges - Write code - Run code - Request access to DataSets 24 We delegate administration of privileges to users
  • 25. Per Project CPU and Storage Quotas •300 GB per Project •1000 CPU mins •Uber-Style Pricing - Elastic Demand Curve 25
  • 26. Sharing DataSets between Projects 27 The same as Sharing Folders in Dropbox
  • 27. Delegate Access Control to HDFS •HDFS enforces access control - UserID per Project - GroupID per Project and DataSet •Metadata Integrity using Foreign Keys - Removing a project removes all users, groups, extended metadata, and (optionally) DataSets. 28
  • 28. Free Text Search with Consistent Metadata 29 Free-Text Search Distributed Database ElasticSearch The Distributed Database is the Single Source of Truth. Foreign keys ensure the integrity of Metadata. MetaData Designer MetaData Entry
  • 31. Short-Term RoadMap •Multi-tenant Kafka - Per-project Topics •Oozie Workflow Editor •Genomics Support with Adam/Spark •Tiered Storage: Hot Data, Normal, Archived •Improved Data Ingress - Sharing Public DataSets Globally using P2P technology 32
  • 32. The Team Active: Jim Dowling, Seif Haridi, Tor Björn Minde, Gautier Berthou, Salman Niazi, Mahmoud Ismail, Kamal Hakimzadeh, Ermias Gebremeskel, Theofilos Kakantousis, Johan Svedlund Nordström, Someya Sayeh, Vasileios Giannokostas, Antonios Kouzoupis, Misganu Dessalegn, Rizvi Hasan, Ahmad Al-Shishtawy, Ali Gholami, Paul Mälzer. Alumni: K. “Sri” Srijeyanthan, Steffen Grohsschmiedt, Alberto Lorente, Andre Moré, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Jude D’Souza, Qi Qi, Gayana Chandrasekara, Nikolaos Stanogias, Daniel Bali, Ioannis Kerkinos, Peter Buechler, Pushparaj Motamari, Hamid Afzali, Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu. Hops
  • 33. Conclusions •HopsWork is providing a world’s first: Hadoop-as-a- Service to researchers and industry. •Workshop on 12th May, 17.30 – 20.00 in SICS, 6th Floor of the Electrum Building, Kista. Register at www.hops.io/?q=news •Join the team – talk to me! 34 www.hops.io www.hops.site

Editor's Notes

  • #2: Associate Prof @ KTH Senior Researcher @ SICS This is an old Brewery – which is a synonym for “Hopworks”, the new product I will talk about today. I am going to talk to you about a product we have been developing for the last few years. It’s “Hadoop for Humans”. We support the main Data Parallel Processing Platforms for Hadoop. And we see Apache Flink as one of those platforms. We expect there to be an ecosystem of platforms, rather than a one-size-fits-all platform. Anyhow, I have been coordinating a large pan-European project called ‘BiobankCloud’ for the last 3 years and one of the problems we have been trying to solve is: how can different medical studies with sensitive data safely reside in the same Hadoop platform. Studies need to be fully isolated from one another That is, we want true multi-tenancy in Hadoop. Now, I am going to try and convince you that HopsWorks provides scaleable multi-tenancy, and other distros don’t. I will try with both rational argument, and more classical approaches. Let’s start with a classical approach. That is – marketing. Marketing people say you won’t believe me. Even if I have a “Dr” before my name. Not good enough. They say, one way around this problem is to get celebrity endorsements. Now, we have a picture of Steve Irwin, the crocodile hunter, on our website, but we couldn’t get him – for obvious reasons… Now here’s our “Logo”
  • #3: Spotify is not the only group in Sweden building a next generation platform for Big Data.
  • #6: Support network of analytics and cloud researchers Researchers & engineers for datacenter infrastructure support
  • #13: 2 main reasons to have separate Hadoop Clusters: Scale and Security. Scale: we support 5x bigger clusters. Separate Clusters for Sensitive DataSets. Sharing data is copying data. Control going down is problematic, as you are making decisions often based on stale data.
  • #14: weirwood tree
  • #16: What happens to the privileges if I remove “db2.sensitiveTable”? foreign keys maintain integrity of policies and data
  • #17: What’s wrong with this picture? No Spark or Flink!
  • #18: General Data Protection Regulation act says “Data Owners” should administer privileges – not sysadmins
  • #19: Dynamic Roles is what I am talking about for the white-hat and black-hats out there.
  • #20: Dynamic Roles is what I am talking about for the white-hat and black-hats out there.
  • #21: Dynamic Roles is what I am talking about for the white-hat and black-hats out there.
  • #22: Dynamic Roles is what I am talking about for the white-hat and black-hats out there. We use secure impersonation in Hadoop to access HDFS and launch YARN jobs
  • #24: Privileges – upload/download data, run analysis jobs Like RBAC solution. All access via HopsWorks.
  • #25: Fixed set of privileges for the roles No need for administrator to manage roles -> Users mapping
  • #29: Convention for directories