(R)evolution of the computing continuum - A few challenges

(R)evolution of the computing
continuum
A few challenges…
International Symposium on Stabilization, Safety, and Security of Distributed Systems
F. Desprez (INRIA), A. Lebre (IMT Atlantique)

Agenda
• Introduction, context, and research issues
• Some recent challenges/scientific issues addressed by the Stack team
1. How to operate a geo-distributed infrastructure
2. Services placement
3. Decentralized indexation
• Experimental infrastructures
• Conclusions

Why do we need a computing continuum ? Mahadev Satyanarayanan

Introduction and context
• Huge increase of data generated (2.5 exabytes of new data generated each
day)
• More than 50 billions of connected devices around the world
• Moving the data from IoT devices to the cloud is an issue
• New applications (time-sensitive, location aware) with ultra-low latencies
requirements
• Privacy issues
• Solution: A computing paradigm closer to the data is generated and used
Impossible !

Edge computing applications
• Autonomous driving
• Security apps
• IoT applications
• Location services
• Network functions
• Industry 4.0
• Edge intelligence

Several ‘flavors’ of distributed computing
• Cloud computing
• Ubiquitous, on-demand access to shared computing resources. Virtualization. Elasticity. IaaS, PaaS, SaaS.
• Fog computing
• « Horizontal system-level architecture that distributes computing, storage, control, and networking closer to the
users along a cloud-to-thing continuum » (OpenFog consortium).
• Mobile computing
• Mobile devices, resource constrained devices, connected though Bluetooth, Wifi, ZigBee, …
• Mobile cloud computing (MCC)
• An infrastructure where both the data storage and data processing occur outside of the mobile device, bringing
mobile computing applications to not just smartphone users but a much broader range of mobile subscribers.
• Mobile and ad hoc cloud computing
• Mobile devices in an ad hoc mobile network form a highly dynamic network topology; the network formed by the
mobile devices is highly dynamic and must accommodate for devices that continuously join or leave the
network.
All one needs to know about fog computing and related edge computing paradigms: A complete survey, A. Yousefpour et al., Journal of Syst. Arch., Vol 98, Sep. 2019

Several ‘flavors’ of distributed computing, contd
• Edge computing
• « Computation done at the edge of the network through small data centers that are close to users »
(OpenEdge Computing).
• Multi-access Edge Computing (MEC)
• « A platform that provides IT and cloud-computing capabilities within the Radio Access Network (RAN) in
4G and 5G, in close proximity to mobile subscribers » (ETSI).
• Cloudlet computing
• Trusted resource-rich computer or a cluster of computers with strong connection to the Internet that is
utilized by nearby mobile devices (Carnegie Mellon University)
• Mist computing
• Dispersed computing at the extreme edge (the IoT devices themselves).
All one needs to know about fog computing and related edge computing paradigms: A complete survey, A. Yousefpour et al., Journal of Syst. Arch., Vol 98, Sep. 2019

Some common characteristics
• Low Latency
• Nodes are closer to the end users and can offer a faster analysis and response to the data generated and
requested by the users
• Geographic Distribution
• Geo-distributed deployment and management,
• Heterogeneity
• Collection and processing of information obtained from different sources and collected by several means of
network communication,
• Interoperability and Federation
• Resources must be able to interoperate with each other and services and applications must be federated
across domains,
• Real-Time Interactions
• Services and applications involve real-time interaction, not just batch processing,
• Scalability
• Fast detection of variation in workload’s response time and of changes in network and device conditions,
supporting elasticity of resources.
Orchestration in Fog Computing: A Comprehensive Survey, B. Costa et al., ACM Computing Surveys, Vol. 55, No. 2, Jan. 2022.

Some research issues
Application lifecycle management (initial deployment, configuration, reconfiguration,
maintenance)
• Abstracting the description of the whole application structure, globally optimize the resources used with respect to multi-criteria
objectives (price, deadline, performance, energy, etc.), models and associated languages to describe applications, their
objective functions, placement and scheduling algorithms supporting system and application-level criteria, ...
Infrastructure management
• Virtualization (hyper-converged 2.0 architecture, complexity, heterogeneity, dynamicity, scaling and locality), storage
(compromise between moving computation vs. data, files, BLOB, key/value systems, geo-distributed graph database, …), and
administration (intelligent orchestrator, geo-distributed scale, automatically adaption to users' needs, ...)
Hardware
• Trusted hardware solutions, architectural support for high level features, energy reduction solutions, new accelerators, …
Security
• Vulnerabilities in VMs, hypervisors and orchestrators, virtual network technologies (SDN, NFV), programming or access
interfaces, adapting security policies to a more complex environment, ...
Energy
• End-to-end energy analysis and management of large-scale hierarchical Cloud/Edge/Fog infrastructures on processing, network
and storage aspects, trade-offs between energy efficiency and other performance metrics in virtualized infrastructures, Eco-
design of digital applications and services, ...
• …

CLOUDLET/FoG/Edge/CLOUD-To-IOT/CONTINUUM Computing
Inter Micro DCs latency
[50ms-100ms]
Edge
Frontier
Edge
Frontier
Extreme Edge
Frontier
Domestic network
Enterprise network
Wired link
Wireless link
Cloud Latency
> 100ms
Cloud Computing
Micro/Nano DC
Intra DC latency
< 10ms
Hybrid network

CHALLENGE 1: HOW TO GEO-DISTRIBUTE
CLOUD APPLICATIONS TO THE EDGE

Defacto open source standard to administrate/virtualize/use
resources of one DC
Scalability?
Latency/throughput impact?
Network partitioning issues?
…
From LAN to WAN? ⇒
Bring Cloud applications to the Edge
INITIATING THE DEBATE WITH OPENSTACK (2016-2021)

Inter Micro DCs latency
[50ms-100ms]
Edge
Frontier
Edge
Frontier
Extreme Edge
Frontier
Domestic network
Enterprise network
Wired link
Wireless link
Cloud Latency
> 100ms
Cloud Computing
Micro/Nano DC
Intra DC latency
< 10ms
Hybrid network
WANWIDE
Collaborative?
INITIATING THE DEBATE WITH OPENSTACK (2016-2021)

13 Millions of LOCs,186 subservices
Designed for a single location
OPENSTACK (THE DEVIL IN
DETAILS)

NOVA
GLANCE
foo
GET
foo
NOVA
GLANCE
NOVA
GLANCE
VM a = openstack server create —image foo
COLLABORATION: ADDITIONAL PIECES OF CODE IS REQUIRED

Collaboration code is
required in every Service
A broker per service
must be implemented
DB values might be
location dependant
COLLABORATION: ADDITIONAL PIECES OF CODE IS REQUIRED

Geo-distributed principles
Collaborations kinds
A SERVICE DEDICATED TO ON DEMAND COLLABORATIONS

The SCOPE lang: Andy defines the scope of the request into the CLI.
The scope specifies where the request applies.

openstack server create my-vm ——flavor m1.tiny --image cirros.uec
—-scope {compute: Nantes, image: Paris}
OpenStack Summit Berlin - Nov 2018
Hacking the Edge hosted by Open Telekom Cloud
• A complete model in order to enhance the scope description with sites compositions (e.g., AND, OR)
• List VMs on Nantes and Paris
openstack server list --scope {compute:Nantes&Paris}

https://gitlab.inria.fr/discovery/cheops (Work in Progress)
A CHEOPS AS A BUILDING BLOCK TO DEAL WITH GEO-DISTRIBUTION

• Expose consistency policies at to the user level (extend the scope syntax)
• Manage the dependencies between resources
• Notion of replication set: manage a fixed pool of resources with an automatic control loop
(implemented in a geo-distributed way at the Cheops level).
Replication overview/challenge

Manage partition issues using appropriate replication/aggregation policies
Cross overview/challenge

A bit more complicated than it looks like…
Delavergne, Marie; Antony, Geo Johns; Lebre, Adrien
Cheops, a service to bloud away Cloud applications to the Edge, To appear in ICSOC 2022
TOWARD A GENERALISATION OF THE SERVICE (OpenStack/Kubernetes/…)

CHALLENGE 2: SERVICE PLACEMENT

Service placement problems
How to assign the IoT applications to computing nodes (Fog nodes) which are distributed in a Fog environment ?
• Different kinds of applications
• Monolithic service, data pipeline, set of inter-dependent components, Directed Acyclic Graphs (DAGs)
• Several constraints
• Computing and networking resources are heterogeneous and dynamic, Computing and network resources are not always available,
Service cannot be processed everywhere
• Different approaches
• Centralized or distributed approaches
• Online or offline placement
• Static or dynamic
• Mobility support
• Different performance criterions
• Execution time, quality of service, latency, energy consumption
• Problem formulations
• Linear programming: Integer Linear Programming (ILP), Integer Nonlinear Programming (INLP), Mixed Integer Linear Programming
(MILP), Mixed-integer non-linear programming (MINLP), Mixed Integer Quadratic Programming (MIQP)
• Constraint programming, Markov decision process, stochastic optimization, potential games, …
An overview of service placement problems in Fog and Edge Computing. F. Ait-Salaht, F. Desprez, and. A. Lebre. ACM Computing Surveys, Vol. 53, Issue 3, May 2021

Service Placement Problem using Constraint programming
and Choco solver
• Goals
• Elaborate a generic and easy to upgrade model
• Define a new formulation of the placement problem considering a general definition of service and
infrastructure network through graphs using constraint programming
Service Placement in Fog Computing Using Constraint Programming. F. Ait-Salaht, F. Desprez, A. Lebre, C. Prud’homme and M. Abderrahim. IEEE

System model and problem formulation
• A directed graph G = <V,E> represents the Network
• V: set of vertices or nodes (server)
• E: set of edges or arcs (connections)
• Each node defines CPU and RAM capacities
• Each arc defines a latency and a bandwidth
capacity
• Infrastructure
• An application is an ordered set of components
• A component requires CPU/RAM to work
• A component can send data (bandwidth, latency)
• Some components are fixed (f-ex., cameras)
• Application

• CPU capacity of each node is
respected
• Same goes with RAM capacity
• Bandwidth capacity is respected on
arcs too
• Latencies are satisﬁed
Placement (mapping)
Assign services (each component and each edge) to network infrastructure
(node and link) such that:
System model and problem formulation

Constraint Programming model (CP)
What is CP ?
• CP stands for Constraint Programming
• CP is a general purpose implementation of Mathematical Programming
• MP theoretically studies optimization problems and resolution techniques
• It aims at describing real combinatorial problems in the form of Constraint
Satisfaction Problems and solving them with Constraint Programming techniques
• The problem is solved by alternating constraint ﬁltering algorithms with a search
mechanism
• Modeling steps (3)
• Declare variables and their domain
• Find relation between them
• Declare a objective function, if any

34
Variables and domains

Constraints

Constraints
Constraints on nodes

Constraints
Constraints on Arcs

Constraints
Constraints between nodes and arcs

Experiment 1
Infrastructure Smart bell application
91 fog
nodes
86
sensors
• Requirements
‣ Resources: CPU, RAM, DISK
‣ Networking: Latency and Bandwidth
‣ Locality
• Objective
‣ Minimize average latency
Implementation of the model on the Choco solver (Free Open-Source Java library dedicated to Constraint Program

Infrastructure Smart bell application
91 fog
nodes
86
sensors
Experiment 1

Infrastructure
Applications
(a) Storage Application, (b) Smart Bell application, and (c) A face recognition application
Greek Forthnet topology
60
PoPs
59 links
Experiment 2

(a) For G with 120 nodes (b) For G with 300 nodes
(c) For G with 600 nodes (d) For G with 1200 nodes
Experiment 2

CHALLENGE 3: THE INDEXING PROBLEM

Where is the content I’m looking for?
Locating the closest replica of a specific content requires indexing every live replica along with
its location
Existing solutions
• Remote services (centralized index, DHT)
In contradiction with the objectives of Edge infrastructures:
The indexing information might be stored in a node that is far away
(or even unreachable) while the replica could be in the vicinity
• Broadcast
• Maintaining such an index at every node would prove overly costly in terms of memory
and traffic (it does not confine the traffic)
• Epidemic propagation

Epidemic Propagation and Dynamic logical partitioning

Challenges
How to maintain such a logical partitioning in a dynamic environment
where…
• Nodes can ADD or DELETE content any time (no synchronization)
• Nodes can join or leave the system at any time (without any warning)
…while limiting the scope of transferred information as much as possible

Challenges
Propagating messages naively is not sufficient to guarantee consistent partit

Lock Down the Traffic of Decentralized Content Indexing at the Edge, B. Nedelec et al., ICA3PP 2022
A preliminary step toward a complete solution
• Definitions of the properties that guarantee
decentralized consistent partitioning in dynamic
infrastructures.
• Demonstration that concurrent creation and
removal of partitions may impair the propagation
of control information
• Proposal of a first algorithm solving this dynamic
partitioning problem (and its evaluation by
simulations)

Experimental infrastructures
SILECS/SLICES: Super Infrastructure for Large-Scale Experimental Computer Science
• The Discipline of Computing: An Experimental Science
• Studied objects are more and more complex (Hardware, Systems, Networks, Programs, Protocols, Data,
Algorithms, …)
• A good experiment should fulfill the following properties
• Reproducibility: must give the same result with the same input
• Extensibility: must target possible comparisons with other works and extensions (more/other processors,
larger data sets, different architectures)
• Applicability: must define realistic parameters and must allow for an easy calibration
• “Revisability”: when an implementation does not perform as expected, must help to identify the reasons
• ACM Artifact Review and Badging

SILECS/Grid’5000
• Testbed for research on distributed systems
• Born in 2003 from the observation that we need a better and larger testbed
• HPC, Grids, P2P, and now Cloud computing, and BigData systems
• A complete access to the nodes’ hardware in an exclusive mode (from one node to
the whole infrastructure)
• Dedicated network (RENATER)
• Reconfigurable: nodes with Kadeploy and network with KaVLAN
• Current status
• 8 sites, 36 clusters, 838 nodes, 15116 cores
• Memory: ~100 TiB RAM + 6.0 TiB PMEM, Storage: 1.42 PB (1515 SSDs and 953
HDDs on nodes), 617.0 TFLOPS (excluding GPUs)
• Diverse technologies/resources (Intel, AMD, Myrinet, Infiniband, two GPU clusters,
energy probes)
• Some Experiments examples
• In Situ analytics, Big Data Management,
• HPC Programming approaches, Batch scheduler optimization
• Network modeling and simulation
• Energy consumption evaluation
• Large virtual machines deployments

SILECS/FIT
Providing Internet players access to a
variety of fixed and mobile
technologies and services, thus
accelerating the design of advanced
technologies for the Future Internet

Experiments
• Discovering resources from their description
• Reconfiguring the testbed to meet experimental
needs
• Monitoring experiments, extracting and
analyzing data
• Controlling experiments: API

Distributed Storage for a Fog/Edge infrastructure
based on a P2P and a Scale-Out NAS

FogIoT Orchestrator: an Orchestration System for IoT
Applications in Fog Environment

European dimension
ESFRI project/infrastructure since 2021

Conclusion
The disconnection is the norm
• High latency, unreliable connections,
• Logical partitioning (Edge areas/zones)
A (r)evolution of distributed systems and networks?
• Algorithms, (distributed) system building blocks should be revised to satisfy geo-
distributed constraints
• Decentralized vs collaborative (e.g. DHT, network ASes)

Questions / THANKS
Post-scriptum
• We are looking for students, Phd candidates,
postdocs, engineers, researchers, associate-
professors (AI/infrastructure experts, this is trendy ;-)),
use-cases, fundings, collaborations…
• We propose … a lot of fun and work!
http://stack.inria.fr

(R)evolution of the computing continuum - A few challenges

More Related Content

What's hot (20)

Similar to (R)evolution of the computing continuum - A few challenges (20)

More from Frederic Desprez (15)

Recently uploaded (20)

(R)evolution of the computing continuum - A few challenges

Editor's Notes