SlideShare a Scribd company logo
HUAWEI TECHNOLOGIES CO. LTD.
www.huawei.com
Federated Mesos Clusters for Global Data Centers Designs
Krishna M Kumar, Lead Architect, Huawei Cloud
HUAWEI TECHNOLOGIES CO. LTD. 2
 Why Federation?
 Multi-Master Federation Approach
 Demo
Contents
HUAWEI TECHNOLOGIES CO. LTD. 3
What is Federation in General?
“A federation is a group of computing or network providers agreeing upon standards of operation in a collective fashion.” - wiki
Regional Authority: Autonomously working body.
Federal Layer: Helps the regional Authorities co-operate with each other.
So Cloud Federation is the union of multiple co-operating data Centers across the geography solving a common purpose.
Federal
Layer
Regional
Authority
Regional
Authority
Regional
Authority
Regional
Authority
HUAWEI TECHNOLOGIES CO. LTD. 4
Federating with Different Service Providers
HUAWEI TECHNOLOGIES CO. LTD. 5
Federating Datacenters within a Service Provider
HUAWEI TECHNOLOGIES CO. LTD. 6
Why Federation?
 High Availability.
 No Vendor Lock-in.
 Cloud bursting(accommodating spikes in demand).
 Load balancing across geographies.
 Application Upgrade/Migrate
 Policy Based Deployment.
 Economic benefits among providers.
HUAWEI TECHNOLOGIES CO. LTD. 7
Cloud Federation
Ubernetes (Google)
Ubernetes
HUAWEI TECHNOLOGIES CO. LTD. 8
Cloud Federation
Nomad (Hashicorps)
Nomads
Gossip
Gossip
Similar to Google’s Borg
HUAWEI TECHNOLOGIES CO. LTD. 9
Docker Swarm (Cluster Federation)
Cloud Federation
HUAWEI TECHNOLOGIES CO. LTD. 10
What about Federation for Mesos Clusters?
HUAWEI TECHNOLOGIES CO. LTD. 11
Some Mesos federation designs considered
in our research lab but dropped………
 Design 1 : Nested/Proxy approach
 Design 2 : Multi-Zone Mesos Super Cluster
 Design 3 : Mesos Global Manager (MGM)
HUAWEI TECHNOLOGIES CO. LTD. 12
Preferred Model: Multi-Master Federation Approach
HUAWEI TECHNOLOGIES CO. LTD. 13
Why Multi-Master?
Its really hard to control Super-Hero’s if you are not one. Ask this man!!!
Phew!!!
Big Day…
HUAWEI TECHNOLOGIES CO. LTD. 14
Benefits of Multi-Master
Each Data Center is a Super Hero, that will co-operate with each other.
• No single point of failure.
• DC co-operate with each other using gossip protocol.
• The frameworks gets fast feedbacks because it is connected to all the masters directly. The framework to be federated.
• Centralized data store layer.
• A simple policy Engine to demonstrate cloud bursting.
HUAWEI TECHNOLOGIES CO. LTD. 15
How it works?
HUAWEI TECHNOLOGIES CO. LTD. 16
Data Center 3
Data Center 1
Data Center 4Data Center 2
Hashicorp’s Consul will store all the
Policy information
Each Mesos Master is accompanied
by a ‘Gossiper’. Who will be the
representative of this Mesos run
Datacenter in the federation.
‘Gossipers’ talks to each other in the
federation and understand the
current policy
Gossiper
Gossipers negotiate with each other
and informs their respective master
what framework deserves the
offers.
Framework
Gossiper
Gossiper
Gossiper
ConsulMaster Master
Master
Master
Broad Overview
HUAWEI TECHNOLOGIES CO. LTD. 17
Data Center 3
Data Center 1
Data Center 4Data Center 2
Gossiper
Gossiper
Gossiper
Gossiper
Consul
Consul
Consul
Consul
Gossipers talk to each other using
hashicorp’s Member List library
Hashicorp’s Consul uses the same
MemberList Library
Overview of
Consul and
Gossiper
Interaction
HUAWEI TECHNOLOGIES CO. LTD. 18
Federated Master
FedAlloc: An Allocation module inherited from the default DRF module of Master
FedComm: A Mesos module of type Anonymous to which gossiper will talk to.
FedAlloc FedComm
Master
Gossiper
Allocation Module Anonymous Module
HUAWEI TECHNOLOGIES CO. LTD. 19
Internals of Federated Master
FedAlloc FedComm
Mesos Master
(Write only)(Read only)
Plug-in Plug-in
(Conditional
Wait)
F. Id
Suppress by FW
Suppress by
Federation
1122001 True True
1122007 True False
1122005 False True
1122004 False False
Gossiper
HUAWEI TECHNOLOGIES CO. LTD. 20
Internals of Federated Master (Cont.)
F. Id Suppress by
FW
Suppress by
Federation
1122001 True True
1122007 True False
1122005 False True
1122004 False False
FedAlloc FedComm
Mesos Master
(Write only)(Read only)
Plug-in Plug-in
(mutex)
FedComm (TCP read on Gossiper)
Lock Table
Write
Unlock Table
Signal Condition
FedAlloc (Conditional Variable)
Lock Table
Read
Call suppress( )/revive( )
unlock
Fedcomm automatically gets invoked
once the condition variable is set.
Gossiper
HUAWEI TECHNOLOGIES CO. LTD. 21
Gossiper
Anon Client: This instructs the master when to start and when to stop sending the Offers.
MasterInfo: This module periodically performs http GET on its respective Mesos master to update its statistical information
HTTP: Http Server that exposes some REST API’s
Member List (ML): Module that actually implements gossip layer.
Consul Lib: Library to talk to Consul and Replicate to other DC’s. Also implements a watch if there is any update on the policy.
Policy Engine: Read from Consul and interprets two policies:
1. Max Threshold
2. Next Max DC
HTTPMaster InfoAnon Client
Policy Engine
Consul Lib
ML
Gossiper
HUAWEI TECHNOLOGIES CO. LTD. 22
Master-Gossiper Interaction
Consul
Data Center 2
Data Center 3
Data Center 4
Data Center 5
FedAlloc FedComm
Master
HTTPMaster InfoAnon Client
Policy Engine
Consul Lib
ML
Gossiper
HUAWEI TECHNOLOGIES CO. LTD. 23
Framework
Protocol
M1: Mesos Master managing our DC1
M2: Mesos Master managing our DC2
M3: Mesos Master managing our DC3
Sample Policy: If we run out of resource in our
DC burst into Next Cloud
Register to Master 1;
M2
Register to Master 2
Offer 1
Launch Task 1
OOR
Offer 2
Launch Task 2
Offer 3
Launch Task 3
M3
Protocol
Register to Master 3
OOR
OOROOR
M1
Sequence Diagram
HUAWEI TECHNOLOGIES CO. LTD. 24
Gossiper - Exchange Framework
Broadcast{
Framework 11
Framework 7
Framework 5
}
Broadcast{
Framework 1
Framework 7
Framework 10
}
Broadcast{
Framework 8
Framework 7
Framework 4
}
Broadcast{
Framework 8
Framework 7
Framework 4
}
Gossiper 4
Gossiper 3
Gossiper 2
Gossiper 1
HUAWEI TECHNOLOGIES CO. LTD. 25
Gossiper - Exchange Resource Information
Gossiper 4
Gossiper 3
Gossiper 2
Gossiper 1
Broadcast{
CPU: 4
RAM: 16GB
Disk: 2TB
}
Broadcast{
CPU: 4
RAM: 8GB
Disk: 80GB
}
Broadcast{
CPU: 2
RAM: 4GB
Disk: 1TB
}
Broadcast{
CPU: 8
RAM: 4GB
Disk: 1.2TB
}
HUAWEI TECHNOLOGIES CO. LTD. 26
Gossiper - Exchange Out Of Resource
Gossiper 4
Gossiper 2
Gossiper 1
Out of Resource(OOR)
Gossiper 3
HUAWEI TECHNOLOGIES CO. LTD. 27
Minimal Policy Engine Implemented for this Experiment
• We needed a minimal Policy Engine to demonstrate cloud-busting scenario
• This Policy Engine is embedded as a part of Gossiper and can interpret only two simple rules
• The content of the Policy Engine in an array of Policy objects.
• Each Policy object has set of rules which needs to be applied.
• We use Hashicorp’s Consul to store Policy which is replicated across datacenter to avoid single point failure.
• Any update in the policy in one DC is instantly propagated to others. Gossiper watches Consul KeyStore and keeps the
latest copy of the policy.
{
"Name": "Policy_One",
"Rules": [{
"Name": "MinMax",
"Priority": 1,
"Scope": "",
"Content": {
"MinOrMax": "MAX"
}
}, {
"Name": "Threshold",
"Priority": 4,
"Scope": "",
"Content": {
"ResourceLimit": 90
}
}]
}
Simple Policy with two Rules
Rule 1:
• If Cloud busting which DC to choose ?
• One with Max Resources or Min Resources?
Rule 2:
• When should you perform Cloud busting?
• At what Resource Percentage?
HUAWEI TECHNOLOGIES CO. LTD. 28
Demo
HUAWEI TECHNOLOGIES CO. LTD. 29
When All Federated Data Centre’s are up and running
HUAWEI TECHNOLOGIES CO. LTD. 30
Map Legends for User Interface
HUAWEI TECHNOLOGIES CO. LTD. 31
When one of the Federated Data Centre’s goes Out of Resource
HUAWEI TECHNOLOGIES CO. LTD. 32
When all the Federated Data Centre’s are down
HUAWEI TECHNOLOGIES CO. LTD. 33
Resource Information of Selected Data Center
HUAWEI TECHNOLOGIES CO. LTD. 34
Resource Utilization of all the Data Center’s
HUAWEI TECHNOLOGIES CO. LTD. 35
Challenges / Future Work Planned
 Policy Engine with enhanced load balancing/Affinity
 Optimize the Gossip protocol for data consistency across clusters.
 Network throughput/Latency
 Service Discovery (i.e. DNS, etc.)
 Consolidated Monitoring, health, alerts, etc.
 Security & compliance in the Federation
 Work with the Mesos community for further refinement……….
Thank you
www.huawei.com
Copyright©2016 Huawei Technologies Co. Ltd. All Rights Reserved.
The information in this document may contain predictive statements including, without limitation, statements regarding the future financial and
operating results, future product portfolio, new technology, etc. There are a number of factors that could cause actual results and developments to
differ materially from those expressed or implied in the predictive statements. Therefore, such information is provided for reference purpose only and
constitutes neither an offer nor an acceptance. Huawei may change the information at any time without notice.

More Related Content

PDF
Building Web Scale Apps with Docker and Mesos by Alex Rukletsov (Mesosphere)
PPTX
HA Kubernetes on Mesos / Marathon
PPTX
Mesos and Kubernetes ecosystem overview
PDF
Musings on Mesos: Docker, Kubernetes, and Beyond.
PDF
Mesos vs kubernetes comparison
PDF
Mesos ♥ Docker
PPTX
Introduction to Apache Mesos
PDF
Scale your docker containers with Mesos
Building Web Scale Apps with Docker and Mesos by Alex Rukletsov (Mesosphere)
HA Kubernetes on Mesos / Marathon
Mesos and Kubernetes ecosystem overview
Musings on Mesos: Docker, Kubernetes, and Beyond.
Mesos vs kubernetes comparison
Mesos ♥ Docker
Introduction to Apache Mesos
Scale your docker containers with Mesos

What's hot (20)

PPTX
Platform as a Service with Kubernetes and Mesos
PDF
Mesos: The Operating System for your Datacenter
PDF
Docker on mesos
PDF
DEPLOYING A DOCKERIZED DISTRIBUTED APPLICATION IN MESOS
PDF
Deploying Containers in Production and at Scale
PDF
Kubernetes "Ubernetes" Cluster Federation by Quinton Hoole (Google, Inc) Huaw...
PDF
Introduction to Apache Mesos
PDF
Apache Mesos: a simple explanation of basics
PPTX
Docker, Mesos, Spark
PDF
Building and deploying a distributed application with Docker, Mesos and Marathon
PDF
Kubernetes on Top of Mesos on Top of DCOS
PDF
Orchestrating Redis & K8s Operators
PDF
Discover the all new Mesosphere DC/OS 1.10
PDF
Integrating Docker with Mesos and Marathon
PDF
Federation of Kubernetes Clusters (Ubernetes) KubeCon 2015 slides - Quinton H...
PDF
Container Orchestration @Docker Meetup Hamburg
PDF
Crossing the Streams Mesos <> Kubernetes
PDF
Topology Service Injection using Dragonflow & Kuryr
PDF
Kubernetes 101 for Developers
PDF
Introduction to mesos bay
Platform as a Service with Kubernetes and Mesos
Mesos: The Operating System for your Datacenter
Docker on mesos
DEPLOYING A DOCKERIZED DISTRIBUTED APPLICATION IN MESOS
Deploying Containers in Production and at Scale
Kubernetes "Ubernetes" Cluster Federation by Quinton Hoole (Google, Inc) Huaw...
Introduction to Apache Mesos
Apache Mesos: a simple explanation of basics
Docker, Mesos, Spark
Building and deploying a distributed application with Docker, Mesos and Marathon
Kubernetes on Top of Mesos on Top of DCOS
Orchestrating Redis & K8s Operators
Discover the all new Mesosphere DC/OS 1.10
Integrating Docker with Mesos and Marathon
Federation of Kubernetes Clusters (Ubernetes) KubeCon 2015 slides - Quinton H...
Container Orchestration @Docker Meetup Hamburg
Crossing the Streams Mesos <> Kubernetes
Topology Service Injection using Dragonflow & Kuryr
Kubernetes 101 for Developers
Introduction to mesos bay
Ad

Viewers also liked (16)

PPTX
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
PPTX
Docker, cornerstone of an hybrid cloud?
PDF
The Story of BIG DATA
PDF
KubeCon CloudNativeCon 2016 Seattle - a report
PDF
Workshop Consul .- Service Discovery & Failure Detection
PPTX
Multi tenancy for docker
PDF
Cassandra multi-datacenter operations essentials
PPTX
Using machine learning to determine drivers of bounce and conversion
PDF
Container Orchestration Wars (Micro Edition)
PDF
CI/CD with Docker, DC/OS, and Jenkins
PPTX
Stateful set in kubernetes implementation & usecases
PDF
Container Orchestration Wars
PPTX
Introduction to Zabbix - Company, Product, Services and Use Cases
PDF
Achieving CI/CD with Kubernetes
PPTX
Docker and Windows: The State of the Union
PDF
KubeCon EU 2016: Kubernetes Storage 101
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Docker, cornerstone of an hybrid cloud?
The Story of BIG DATA
KubeCon CloudNativeCon 2016 Seattle - a report
Workshop Consul .- Service Discovery & Failure Detection
Multi tenancy for docker
Cassandra multi-datacenter operations essentials
Using machine learning to determine drivers of bounce and conversion
Container Orchestration Wars (Micro Edition)
CI/CD with Docker, DC/OS, and Jenkins
Stateful set in kubernetes implementation & usecases
Container Orchestration Wars
Introduction to Zabbix - Company, Product, Services and Use Cases
Achieving CI/CD with Kubernetes
Docker and Windows: The State of the Union
KubeCon EU 2016: Kubernetes Storage 101
Ad

Similar to Federated mesos clusters for global data center designs (20)

PPTX
Zoo keeper in the wild
PDF
TYPO3 CMS v8 in the cloud
PDF
Apache Kafka® and the Data Mesh
PDF
A case study why Zoominfo uses Terraform Cloud in high-scale environment.
PDF
Top 3 Network Challenges Limiting IT Agility
PDF
Introducción a Microservicios, SUSE CaaS Platform y Kubernetes
PDF
Docker meetup - PaaS interoperability
PPT
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?
PDF
Mesosphere quick overview
PPTX
Webinar: Transforming Substation Automation with Open Source Solutions
PDF
#VirtualDesignMaster 3 Challenge 1 – James Brown
PPTX
From Duke of DevOps to Queen of Chaos - Api days 2018
PDF
Pivotal Cloud Foundry 2.3: A First Look
PDF
Melbourne Virtual MuleSoft Meetup December 2022
PDF
FME:23 for the Enterprise - A Deep Dive into Key New Features
PDF
Melbourne Virtual MuleSoft Meetup April 2022
PDF
Elevate Your Enterprise with FME 23.1
PPTX
Why Cloud Management Makes Sense
PDF
Do modernizing the Mainframe for DevOps.
PDF
Hyperledger Besu for Private & Public Enterprise introduction slides
Zoo keeper in the wild
TYPO3 CMS v8 in the cloud
Apache Kafka® and the Data Mesh
A case study why Zoominfo uses Terraform Cloud in high-scale environment.
Top 3 Network Challenges Limiting IT Agility
Introducción a Microservicios, SUSE CaaS Platform y Kubernetes
Docker meetup - PaaS interoperability
Cloud Camp Milan 2K9 Telecom Italia: Where P2P?
Mesosphere quick overview
Webinar: Transforming Substation Automation with Open Source Solutions
#VirtualDesignMaster 3 Challenge 1 – James Brown
From Duke of DevOps to Queen of Chaos - Api days 2018
Pivotal Cloud Foundry 2.3: A First Look
Melbourne Virtual MuleSoft Meetup December 2022
FME:23 for the Enterprise - A Deep Dive into Key New Features
Melbourne Virtual MuleSoft Meetup April 2022
Elevate Your Enterprise with FME 23.1
Why Cloud Management Makes Sense
Do modernizing the Mainframe for DevOps.
Hyperledger Besu for Private & Public Enterprise introduction slides

More from Krishna-Kumar (20)

PDF
SODA Ambassadors & Community Ecosystem
PDF
Open Source Building Career and Competency
PDF
CCICI CIP 1.0 Testbed - Security access implementation and reference - v1.0
PDF
Google Anthos - Azure Stack - AWS Outposts :Comparison
PDF
Cloud Native Use Cases / Case Studies - KubeCon 2019 San Diego - RECAP
PDF
Cloud interoperability and open standards for digital india open infrasummit
PDF
Google Cloud Container Security Quick Overview
PDF
Kubernetes Application Deployment with Helm - A beginner Guide!
PDF
KubeCon + CloudNativeCon Barcelona and Shanghai 2019 - Highlights
PDF
Introduction to ieee standards development - Bangalore Section
PDF
IEEE Standards Association - Introduction
PDF
IoTShow.in Bangalore 2019 - a Recap on 'IoT and Edge' Talk.
PDF
Kubecon seattle 2018 recap - Application Deployment aspects
PPTX
Open Source Edge Computing Platforms - Overview
PDF
cncf overview and building edge computing using kubernetes
PDF
Evolution of containers to kubernetes
PDF
My Ladakh Marathon Run 2018
PPTX
Containers and workload security an overview
PDF
Now yoga - a study on where why what how
PPTX
CNCF Introduction - Feb 2018
SODA Ambassadors & Community Ecosystem
Open Source Building Career and Competency
CCICI CIP 1.0 Testbed - Security access implementation and reference - v1.0
Google Anthos - Azure Stack - AWS Outposts :Comparison
Cloud Native Use Cases / Case Studies - KubeCon 2019 San Diego - RECAP
Cloud interoperability and open standards for digital india open infrasummit
Google Cloud Container Security Quick Overview
Kubernetes Application Deployment with Helm - A beginner Guide!
KubeCon + CloudNativeCon Barcelona and Shanghai 2019 - Highlights
Introduction to ieee standards development - Bangalore Section
IEEE Standards Association - Introduction
IoTShow.in Bangalore 2019 - a Recap on 'IoT and Edge' Talk.
Kubecon seattle 2018 recap - Application Deployment aspects
Open Source Edge Computing Platforms - Overview
cncf overview and building edge computing using kubernetes
Evolution of containers to kubernetes
My Ladakh Marathon Run 2018
Containers and workload security an overview
Now yoga - a study on where why what how
CNCF Introduction - Feb 2018

Recently uploaded (20)

PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
Introduction to Artificial Intelligence
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
AI in Product Development-omnex systems
PDF
top salesforce developer skills in 2025.pdf
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
medical staffing services at VALiNTRY
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
L1 - Introduction to python Backend.pptx
PPT
Introduction Database Management System for Course Database
PDF
System and Network Administration Chapter 2
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
Transform Your Business with a Software ERP System
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
VVF-Customer-Presentation2025-Ver1.9.pptx
Introduction to Artificial Intelligence
Which alternative to Crystal Reports is best for small or large businesses.pdf
AI in Product Development-omnex systems
top salesforce developer skills in 2025.pdf
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
medical staffing services at VALiNTRY
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
L1 - Introduction to python Backend.pptx
Introduction Database Management System for Course Database
System and Network Administration Chapter 2
2025 Textile ERP Trends: SAP, Odoo & Oracle
Odoo Companies in India – Driving Business Transformation.pdf
PTS Company Brochure 2025 (1).pdf.......
How to Choose the Right IT Partner for Your Business in Malaysia
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Transform Your Business with a Software ERP System
Navsoft: AI-Powered Business Solutions & Custom Software Development
Adobe Illustrator 28.6 Crack My Vision of Vector Design

Federated mesos clusters for global data center designs

  • 1. HUAWEI TECHNOLOGIES CO. LTD. www.huawei.com Federated Mesos Clusters for Global Data Centers Designs Krishna M Kumar, Lead Architect, Huawei Cloud
  • 2. HUAWEI TECHNOLOGIES CO. LTD. 2  Why Federation?  Multi-Master Federation Approach  Demo Contents
  • 3. HUAWEI TECHNOLOGIES CO. LTD. 3 What is Federation in General? “A federation is a group of computing or network providers agreeing upon standards of operation in a collective fashion.” - wiki Regional Authority: Autonomously working body. Federal Layer: Helps the regional Authorities co-operate with each other. So Cloud Federation is the union of multiple co-operating data Centers across the geography solving a common purpose. Federal Layer Regional Authority Regional Authority Regional Authority Regional Authority
  • 4. HUAWEI TECHNOLOGIES CO. LTD. 4 Federating with Different Service Providers
  • 5. HUAWEI TECHNOLOGIES CO. LTD. 5 Federating Datacenters within a Service Provider
  • 6. HUAWEI TECHNOLOGIES CO. LTD. 6 Why Federation?  High Availability.  No Vendor Lock-in.  Cloud bursting(accommodating spikes in demand).  Load balancing across geographies.  Application Upgrade/Migrate  Policy Based Deployment.  Economic benefits among providers.
  • 7. HUAWEI TECHNOLOGIES CO. LTD. 7 Cloud Federation Ubernetes (Google) Ubernetes
  • 8. HUAWEI TECHNOLOGIES CO. LTD. 8 Cloud Federation Nomad (Hashicorps) Nomads Gossip Gossip Similar to Google’s Borg
  • 9. HUAWEI TECHNOLOGIES CO. LTD. 9 Docker Swarm (Cluster Federation) Cloud Federation
  • 10. HUAWEI TECHNOLOGIES CO. LTD. 10 What about Federation for Mesos Clusters?
  • 11. HUAWEI TECHNOLOGIES CO. LTD. 11 Some Mesos federation designs considered in our research lab but dropped………  Design 1 : Nested/Proxy approach  Design 2 : Multi-Zone Mesos Super Cluster  Design 3 : Mesos Global Manager (MGM)
  • 12. HUAWEI TECHNOLOGIES CO. LTD. 12 Preferred Model: Multi-Master Federation Approach
  • 13. HUAWEI TECHNOLOGIES CO. LTD. 13 Why Multi-Master? Its really hard to control Super-Hero’s if you are not one. Ask this man!!! Phew!!! Big Day…
  • 14. HUAWEI TECHNOLOGIES CO. LTD. 14 Benefits of Multi-Master Each Data Center is a Super Hero, that will co-operate with each other. • No single point of failure. • DC co-operate with each other using gossip protocol. • The frameworks gets fast feedbacks because it is connected to all the masters directly. The framework to be federated. • Centralized data store layer. • A simple policy Engine to demonstrate cloud bursting.
  • 15. HUAWEI TECHNOLOGIES CO. LTD. 15 How it works?
  • 16. HUAWEI TECHNOLOGIES CO. LTD. 16 Data Center 3 Data Center 1 Data Center 4Data Center 2 Hashicorp’s Consul will store all the Policy information Each Mesos Master is accompanied by a ‘Gossiper’. Who will be the representative of this Mesos run Datacenter in the federation. ‘Gossipers’ talks to each other in the federation and understand the current policy Gossiper Gossipers negotiate with each other and informs their respective master what framework deserves the offers. Framework Gossiper Gossiper Gossiper ConsulMaster Master Master Master Broad Overview
  • 17. HUAWEI TECHNOLOGIES CO. LTD. 17 Data Center 3 Data Center 1 Data Center 4Data Center 2 Gossiper Gossiper Gossiper Gossiper Consul Consul Consul Consul Gossipers talk to each other using hashicorp’s Member List library Hashicorp’s Consul uses the same MemberList Library Overview of Consul and Gossiper Interaction
  • 18. HUAWEI TECHNOLOGIES CO. LTD. 18 Federated Master FedAlloc: An Allocation module inherited from the default DRF module of Master FedComm: A Mesos module of type Anonymous to which gossiper will talk to. FedAlloc FedComm Master Gossiper Allocation Module Anonymous Module
  • 19. HUAWEI TECHNOLOGIES CO. LTD. 19 Internals of Federated Master FedAlloc FedComm Mesos Master (Write only)(Read only) Plug-in Plug-in (Conditional Wait) F. Id Suppress by FW Suppress by Federation 1122001 True True 1122007 True False 1122005 False True 1122004 False False Gossiper
  • 20. HUAWEI TECHNOLOGIES CO. LTD. 20 Internals of Federated Master (Cont.) F. Id Suppress by FW Suppress by Federation 1122001 True True 1122007 True False 1122005 False True 1122004 False False FedAlloc FedComm Mesos Master (Write only)(Read only) Plug-in Plug-in (mutex) FedComm (TCP read on Gossiper) Lock Table Write Unlock Table Signal Condition FedAlloc (Conditional Variable) Lock Table Read Call suppress( )/revive( ) unlock Fedcomm automatically gets invoked once the condition variable is set. Gossiper
  • 21. HUAWEI TECHNOLOGIES CO. LTD. 21 Gossiper Anon Client: This instructs the master when to start and when to stop sending the Offers. MasterInfo: This module periodically performs http GET on its respective Mesos master to update its statistical information HTTP: Http Server that exposes some REST API’s Member List (ML): Module that actually implements gossip layer. Consul Lib: Library to talk to Consul and Replicate to other DC’s. Also implements a watch if there is any update on the policy. Policy Engine: Read from Consul and interprets two policies: 1. Max Threshold 2. Next Max DC HTTPMaster InfoAnon Client Policy Engine Consul Lib ML Gossiper
  • 22. HUAWEI TECHNOLOGIES CO. LTD. 22 Master-Gossiper Interaction Consul Data Center 2 Data Center 3 Data Center 4 Data Center 5 FedAlloc FedComm Master HTTPMaster InfoAnon Client Policy Engine Consul Lib ML Gossiper
  • 23. HUAWEI TECHNOLOGIES CO. LTD. 23 Framework Protocol M1: Mesos Master managing our DC1 M2: Mesos Master managing our DC2 M3: Mesos Master managing our DC3 Sample Policy: If we run out of resource in our DC burst into Next Cloud Register to Master 1; M2 Register to Master 2 Offer 1 Launch Task 1 OOR Offer 2 Launch Task 2 Offer 3 Launch Task 3 M3 Protocol Register to Master 3 OOR OOROOR M1 Sequence Diagram
  • 24. HUAWEI TECHNOLOGIES CO. LTD. 24 Gossiper - Exchange Framework Broadcast{ Framework 11 Framework 7 Framework 5 } Broadcast{ Framework 1 Framework 7 Framework 10 } Broadcast{ Framework 8 Framework 7 Framework 4 } Broadcast{ Framework 8 Framework 7 Framework 4 } Gossiper 4 Gossiper 3 Gossiper 2 Gossiper 1
  • 25. HUAWEI TECHNOLOGIES CO. LTD. 25 Gossiper - Exchange Resource Information Gossiper 4 Gossiper 3 Gossiper 2 Gossiper 1 Broadcast{ CPU: 4 RAM: 16GB Disk: 2TB } Broadcast{ CPU: 4 RAM: 8GB Disk: 80GB } Broadcast{ CPU: 2 RAM: 4GB Disk: 1TB } Broadcast{ CPU: 8 RAM: 4GB Disk: 1.2TB }
  • 26. HUAWEI TECHNOLOGIES CO. LTD. 26 Gossiper - Exchange Out Of Resource Gossiper 4 Gossiper 2 Gossiper 1 Out of Resource(OOR) Gossiper 3
  • 27. HUAWEI TECHNOLOGIES CO. LTD. 27 Minimal Policy Engine Implemented for this Experiment • We needed a minimal Policy Engine to demonstrate cloud-busting scenario • This Policy Engine is embedded as a part of Gossiper and can interpret only two simple rules • The content of the Policy Engine in an array of Policy objects. • Each Policy object has set of rules which needs to be applied. • We use Hashicorp’s Consul to store Policy which is replicated across datacenter to avoid single point failure. • Any update in the policy in one DC is instantly propagated to others. Gossiper watches Consul KeyStore and keeps the latest copy of the policy. { "Name": "Policy_One", "Rules": [{ "Name": "MinMax", "Priority": 1, "Scope": "", "Content": { "MinOrMax": "MAX" } }, { "Name": "Threshold", "Priority": 4, "Scope": "", "Content": { "ResourceLimit": 90 } }] } Simple Policy with two Rules Rule 1: • If Cloud busting which DC to choose ? • One with Max Resources or Min Resources? Rule 2: • When should you perform Cloud busting? • At what Resource Percentage?
  • 28. HUAWEI TECHNOLOGIES CO. LTD. 28 Demo
  • 29. HUAWEI TECHNOLOGIES CO. LTD. 29 When All Federated Data Centre’s are up and running
  • 30. HUAWEI TECHNOLOGIES CO. LTD. 30 Map Legends for User Interface
  • 31. HUAWEI TECHNOLOGIES CO. LTD. 31 When one of the Federated Data Centre’s goes Out of Resource
  • 32. HUAWEI TECHNOLOGIES CO. LTD. 32 When all the Federated Data Centre’s are down
  • 33. HUAWEI TECHNOLOGIES CO. LTD. 33 Resource Information of Selected Data Center
  • 34. HUAWEI TECHNOLOGIES CO. LTD. 34 Resource Utilization of all the Data Center’s
  • 35. HUAWEI TECHNOLOGIES CO. LTD. 35 Challenges / Future Work Planned  Policy Engine with enhanced load balancing/Affinity  Optimize the Gossip protocol for data consistency across clusters.  Network throughput/Latency  Service Discovery (i.e. DNS, etc.)  Consolidated Monitoring, health, alerts, etc.  Security & compliance in the Federation  Work with the Mesos community for further refinement……….
  • 36. Thank you www.huawei.com Copyright©2016 Huawei Technologies Co. Ltd. All Rights Reserved. The information in this document may contain predictive statements including, without limitation, statements regarding the future financial and operating results, future product portfolio, new technology, etc. There are a number of factors that could cause actual results and developments to differ materially from those expressed or implied in the predictive statements. Therefore, such information is provided for reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the information at any time without notice.