Evaluation of Real-Time Communication in IoT Services by WebRTC

THEMA: Evaluation of Real-Time Communication in
IoT Services by WebRTC
Masterarbeit
im Studiengang International Software Systems Science der
Fakultät Wirtschaftsinformatik und Angewandte Informatik der
Otto-Friedrich-Universität Bamberg
Verfasser: Chandan Sarkar
Themensteller: Professor Dr. Udo Kireger
Abgabedatum: 22.12.2018

I
Contents
List of Abbreviations V
1. Introduction 1
1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2. Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3. Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4. Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2. Foundation 11
2.1. Internet of Things . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1. Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.2. Building Blocks for IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.3. Technical Ecosystem of IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.4. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1.5. Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2. WebRTC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.1. The Session Initiation Protocol . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2.2. WebRTC Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2.3. WebRTC API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2.4. Emergence of New Communication Paradigms . . . . . . . . . . . . . . . 37
2.2.5. Proposed WebRTC API for QUIC . . . . . . . . . . . . . . . . . . . . . . . . 47
2.2.6. RTCQUICTransport Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.2.7. RTCQUICStream Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.3. Proposed Design Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.3.1. Design Goals for a Standard Framework . . . . . . . . . . . . . . . . . . . 55
2.3.2. Network Architecture of The Prototype and Session Establishment Ca-
pabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3. Implementation 59
3.1. Prototype for Our Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.1.1. Docker As Open Container-based Virtualization Platform . . . . . . . . . 60
3.1.2. Components of our Prototype and their Constraints . . . . . . . . . . . . 65
3.2. Approaches for Building our Solution . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.3. Environment and Orchestration for Our Solution . . . . . . . . . . . . . . . . . . 80
4. Evaluation 89
5. Conclusion 93

II
Index 101
Appendix A. Appendix 101

III
List of Figures
1.1. Telemedicine web application interface [1] . . . . . . . . . . . . . . . . . . . . . . 4
1.2. Three dimensional video conferencing concept [2] . . . . . . . . . . . . . . . . . 5
1.3. Prototype of affordable live streaming based on embedded devices [3] . . . . . 6
2.1. IoT layered architecture [4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2. IoT Fundamental Building Blocks [4] . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3. IoT Technical Ecosystem [5] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4. SIP Trapezoid [6] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5. SIP Triangle [6] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.6. WebRTC browser reliance [6] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.7. WebRTC Architecture [7] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.8. Standardized stack of QUIC [8] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.9. RTCQUICTransport object lifetime as state machine representation . . . . . . . 49
2.10.RTCQUICStreamState as state machine representation . . . . . . . . . . . . . . . 53
2.11.IoT Network Architecture Prototype [9] . . . . . . . . . . . . . . . . . . . . . . . . 56
2.12.Connection Establishment for devices without a WebRTC Stack [9] . . . . . . . 58
3.1. Contrast between container-based and hypervisor-based virtualization [10] . . 60
3.2. Docker architecture [11] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.3. Overview of publish subscribe pattern [12] . . . . . . . . . . . . . . . . . . . . . . 65
3.4. Approach 1 for implementation of prototypical application . . . . . . . . . . . . 74
3.5. Socket.io implementation for interaction between static page and subscriber
web application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.6. Approach 2 for implementation of prototypical application . . . . . . . . . . . . 77
3.7. Sequence diagram for implementation . . . . . . . . . . . . . . . . . . . . . . . . 79
3.8. Isolated publisher module configured in raspberry pi . . . . . . . . . . . . . . . . 84
3.9. Subscriber module consuming the video . . . . . . . . . . . . . . . . . . . . . . . 87

IV
List of Tables
2.1. Similar behavior that QUIC shares with SCTP and DTLS . . . . . . . . . . . . . . 44
2.2. RTCQUICRole specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.3. RTCQUICStream properties specification . . . . . . . . . . . . . . . . . . . . . . . 51
2.4. RTCQUICStream methods specification . . . . . . . . . . . . . . . . . . . . . . . . 52

V
List of Abbreviations
6LoWPAN IPv6 Low-power Wireless Personal Area Network
AAA Authentication Authorization Accounting
BLE Bluetooth Low Energy
DTLS Data Transport Layer Security
EXI Efficient XML Interchange
FEC Forward Error Correction
HTTP Hyper Text Transport Protocol
HVAC Heating Ventilation Air Conditioning
ICE Internet Connectivity Establishment
IETF Internet Engineering Task Force
IoT Internet of Things
JSEP JavaScript Session Establishment Protocol
LTE Long Term Evolution
LXC Linux Container
MEC Mobile Edge Computing
MQTT Message Queuing Telemetry Transport
NAT Network Address Translation
NFC Near Field Communication
ORTC Object Real Time Communication
OSI Open Source Interconnection
P2P Peer-to-Peer
QoS Quality of Services
QUIC Quick UDP Internet Connection
RDF Resource Description Framework
RFID Radio Frequency IDentification
RTCP Real-time Transport Control Protocol
RTP Real-time Transport Protocol

VI
SCTP Stream Control Transport Protocol
SDP Session Description Protocol
SIP Session Initiation Protocol
SOA Service Oriented Architecture
SRTP Secure Real-time Transport Protocol
SSL Secure Socket Layer
STUN Session Traversal Utilities for NAT
TCP Transmission Control Protocol
TLS Transport Layer Security
TSN Transmission Sequence Number
TURN Traversal Using Relay NAT
UDP User Datagram Protocol
VoIP Voice over IP
W3C World Wide Web Consortium
WebRTC Web Real Time Communication
XMPP Extensible Messaging and Presence Protocol

1
1 Introduction
In current years Internet of Things (IoT) continues to be a growing trend in computing and
innovation. According to the world’s leading research and advisory company Gartner, the
number of connected IoT devices today itself might very well exceed the world’s population
and estimated to cross 30 billion by 2020 [13]. It is expected to impact almost all forms of
industries. The ubiquity of smart devices increases, which in turn causes an increase in net-
work traffic, bandwidth and security concerns. In order to counter these issues at the edge
of the network, computing gets consistent promotion and the paradigm of data follows
computation prevails.
We discover a wide range of application areas where IoT brings innovative solutions. Smart
devices monitor home and office environments in order to save energy, enhance security and
surveillance or simply improve the quality of living. The health care sector tries to help the
weak and elderly population with solutions like fall detection due to weakness with the help
of wearable sensors, and analysis of body movement patterns or remotely monitoring health
conditions and suggesting medications. Availability of precise location data for human be-
ings and non-human assets enables development of cutting-edge location-based services
which makes room for a myriad of possibilities e.g. location-aware augmented reality video
games, smart transport systems or even predicting possible drastic changes in weather con-
ditions based on a given location. Thus, one potential capability results in a multitude of
possibilities. In this study, we delve into the capability of multimedia and arbitrary data
transmission in the context of IoT and evaluate its potential aspects.

2 1. Introduction
1.1. Motivation
Sensors play a crucial role in IoT solutions since they are the key source of digital data. They
come in various shapes and sizes and detect various forms of changing aspects in a given
environment. They can sense an event in the physical world and transform the same to
some form of digital signal. Sensors could be attached to various physical objects or even on
human beings in the form of wearables. Combining various sensors within a communica-
tion network, multiple devices can share information and create a smart distributed system
which is capable to offer innovative IoT solutions e.g. video analytics.
Some of the prominent IoT solutions inherently depend on real-time audio and video trans-
mission. They use efficient techniques to process a stream of media data and make smart
decisions. This category of IoT solutions is steadily gaining popularity because of the in-
creasing availability of low-cost hardware and software components as well as storage so-
lutions. It is argued that processing video provides more value and a larger perspective for
decision making compared to traditional procedures based on data collected by other sen-
sors. For example, in a retail store beacon can offer analytics on the number of customers
visiting over a period of time or in a given season, based on the number of smart devices
they can detect. Video analytics, however, may offer more in-depth knowledge of customers
of different genders, ages, and their purchase interests [14]. Two of the most popular modern
use cases of processing video in this context are,
• Surveillance is one of the most prominent IoT use cases for video analytics in recent
times. It contributes significantly to crime detection and monitoring in home and of-
fice environments. It is estimated that developing improved crime detection and mon-
itoring techniques via surveillance can produce billions of dollars worth economic
value by 2025 [14].
• Retail is the second most significant application area for processing video information.
Monitoring video feeds helps large shopping centers and retail stores in various ways
to ensure customer satisfaction and improve their business. For instance, shopping
centers can come up with promotional offers based on the interest of the customers
or allocate staff at the right places of the store at the right time to assist the customers.
This helps to create a satisfactory shopping experience which benefits the business.

1.2. Related Works 3
Furthermore, a few other emerging areas where video analytics could bring promising value
is public safety in large city transport systems, increased employee productivity in manufac-
turing industries, remotely monitoring medical conditions etc..
1.2. Related Works
Now that we discussed the values that the capability of processing videos in real time could
bring over the processing for traditional sensor data, we briefly illustrate some of the related
works of recent times concerning capturing video data in our context,
• Telemedicine is defined by the diagnosis and prescription of medicine to patients
over telecommunication technologies, which allows doctors to remotely assist the peo-
ples in need across the geographical barrier. Although telemedicine is not a new con-
cept as a whole it comes with certain inherent constraints. Popular telemedicine so-
lutions used by large hospitals and health care centers are proprietary and not always
easy to use for everyone. They use dedicated video camera or microphones or some-
times may require the application to be installed in the computers. Some forms of
diagnosis even need highly specialized equipment. From the consumer perspective
availing these services is not always affordable. Similarly, procurement of maintenance
of such setup could be problematic for some providers. In order to counter these limi-
tations Antunes et al. suggested an easy to implement browser-based web application
with the help of WebRTC [1].
According to their solution both parties engaged in a session has the requirement of
a web browser, web camera, and microphones. The application uses a core module
which is responsible for connecting two peers and maintaining the session along with
identification of logged-in users. The application features a browser-based interactive
interface for text messaging and file sharing as demonstrated in figure 1.1.

4 1. Introduction
Figure 1.1.: Telemedicine web application interface [1]
The implementation was tested with varying scenarios of communication between do-
mestic users and medical professionals with promising outcomes.
• Sulema et al. [2] proposed an innovative idea of three-dimensional video conferencing
system based on WebRTC. The motivations for such a video conferencing application
are twofold. It is studied that being face-to-face in person is the most effective way of
communication. While that is not possible implementing three-dimensional imaging
could benefit over traditional video communication. Besides, it is observed that con-
sumers are more comfortable with communicating with higher resolution and bigger
interfaces when available compared to their smaller counterparts, e.g., smartphones.
The proposed three-dimensional video conferencing system is modularized into four
parts namely,
- Three dimensional image capturing module responsible for real-time im-
age recording. It can be implemented with one or several cameras depending on the
layout of the system.
- Data processing responsible for real-time processing and serialization of visual
data in order to make it appropriate for the three-dimensional renderer.
- Communication module is responsible for exchange of information.
- Three dimensional image visualization responsible for rendering image
data into three-dimensional appearance.
While the data processing and communication operations could be dealt with by any
capable computing system three-dimensional image visualization module needs fur-
ther clarification.

1.2. Related Works 5
Sulema et al. [2] describe that three-dimensional image visualization could be achieved
in two ways namely glass-based solution where viewers engaged in the communi-
cation need to wear specialized glasses and glass-free solution where some kind of
three-dimensional reflectors need to be used. They argued in favor of glass-free so-
lution with the argument of cost-effectiveness and comfort. Three-dimensional re-
flectors usually rely on four images to build the perspective of four side-wise view-
points. If these images captured from four viewpoints are displayed with a bright
enough screen, the three-dimensional refection device creates an illusion perceived
as a three-dimensional refection by the human visual system.
In order to ensure accurate processing of image data from all four viewpoints they
Sulema et al. [2] suggest the use of a media server e.g. Kurento [15]. Kurento inte-
grates WebRTC along with advanced computer vision APIs in order to offer transmis-
sion, processing, augmented reality and various other facilities in order to make media
oriented application development simpler.
Figure 1.2.: Three dimensional video conferencing concept [2]
In order to demonstrate their prototype they obtained video streams from four men-
tioned viewpoints and used the media server to detect the subject of interest and re-
placed the background of each frame with dark color. This stream then transmitted

6 1. Introduction
and visualized with the help of three-dimensional reflector device placed on smart de-
vice screen as depicted in the figure 1.2.
• While we illustrated two of the works based on native WebRTC API and media server
to handle more sophisticated computation our third specimen is on the simpler side
and promoting an affordable approach. Baris Unver from the University of Minnesota
argues that cumulatively there are three premium options to create video streaming
system from the context of IoT, namely
- Media Server, which offers the customers to either buy a licensed software and
install it on the server or directly pay a one-time or recurring price to the providers
offering media service. In case of acquiring server, space bandwidth is a thing to con-
sider and maintainers need to have some server administration skills as well. Managed
services e.g., AWS, on the other hand, relieves the users from the burden of managing
a server and offers all computing resources as cloud services.
- Self-streaming via IP camera is another option which comes with servers
built in. All customers need a fixed IP address and port mapping to consume the media
stream.
- Streaming website or applications e.g. YouTube, Facebook and SnapChat
can be considered as options but these services are inherently public having fewer cus-
tomization options and do not offer any privacy measures out of the box.
Baris, therefore, promotes a simpler and much more affordable prototype using em-
bedded devices like Raspberry Pi and WebRTC having simpler setup as depicted in the
figure 1.3.
Figure 1.3.: Prototype of affordable live streaming based on embedded devices [3]

1.3. Goal 7
Baris uses a Raspberry Pi module and a USB web camera as the main components
alongside other essential parts. He uses Chromium browser to access the media stream
track from the underlying window object and creates the peer connection. The video
stream could be stored on the Raspberry Pi and can be used for video surveillance [3].
1.3. Goal
We illustrated three contextual specimens which relate to the subject matter we are inter-
ested in. Over a careful analysis of the listed specimens, we come across some constraints of
these implementations such as,
• The telemedicine solution is inherently browser dependent. In order to engage in in-
teractive bi-directional communication, both parties need to invoke a web browser
supporting WebRTC specification stack. Some sort of customized signaling solution
will also be needed to implement the web conference room.
• Three-dimensional video conferencing relies on a media server for a heavy load of im-
age processing computation and the solution is focused on some conceptual use cases.
• Affordable surveillance camera solution is simple and cost-effective but hides all de-
sign details and thereby does not reflect on standard IoT solution design pattern. Be-
sides, this solution also depends on a web browser supporting WebRTC specification
stack.
While these implementations serve their specific purposes, none of them features a stan-
dardization of the framework in the context of IoT. In spite of being distributed applications,
they don’t feature any specific design patterns which can become a standard to design dis-
tributed IoT applications. As we discuss in the foundation section of IoT in detail, we embed
the physical objects with lightweight sensory modules in order to transform these objects
into sources of collecting data. Therefore, these IoT objects must function with limited com-
puting power. The IoT solution for video streaming must not create any resource bottleneck
in such devices. Furthermore, the video stream is meant to be consumed with the help of
web browsers or smart devices like phones or tablet computers. We want to avoid the instal-
lation or heavy load of configuration at the consumer end. We seek a lightweight solution for

8 1. Introduction
video streaming which allows fast prototyping and avoids the overhead of configuration on
the consumers. We need to feature an appropriate distributed system design pattern and a
service-oriented approach to boost scalability and maintainability.
Therefore, in this study we set our goal to build a conceptual foundation of the IoT along
with its constraints. We propose a standardization of an IoT application framework with the
help of WebRTC in order to address the inherent challenges of IoT. We adapt to an appro-
priate architectural design pattern in order to create a fast, efficient, browser-independent,
maintainable, scalable, and distributed IoT prototypical solution. Finally, we evaluate our
approach based on the objective and the outcome.
1.4. Structure
In this study, we start by discussing the foundational concepts of the IoT as a paradigm. In
the first section of foundation chapter, we discuss the coarsely granular three layer archi-
tecture and thereafter the finely granular five-layer architecture of IoT. Subsequently, funda-
mental building blocks and technical ecosystem of IoT are discussed. We attempt to illus-
trate the impact of emerging networking and communication technologies on these building
blocks and the overall IoT ecosystem. Furthermore, we discuss the requirements and chal-
lenges that developers encounter while realizing IoT solutions. We deduce from this discus-
sion that the standardization of the framework could be a viable option to address some of
these requirements and challenges. We introduce WebRTC as a worthy candidate to be the
standard framework to build IoT solutions.
We start with the second section of foundation chapter by introducing the core concepts of
WebRTC. In this section, we start our discussion with the classic web communication using
the client-server paradigm. Thereafter, we prepare the ground for WebRTC with the intro-
duction of SIP. We discuss the architecture of WebRTC along with its traditional API specifi-
cation. The discussion of real-time communication over the network is not complete with-
out an appreciation of the emerging communication protocols, e.g., HTTP 2.0 and QUIC. We
introduce HTTP 2.0 and QUIC protocols of Internet communication along with the features
and promises they bring to the modern networking demographic. We conclude the founda-
tion chapter with a humble introduction of the newly proposed QUIC API specification for

1.4. Structure 9
WebRTC.
In the third section of foundation chapter we specify the design goals for an IoT applica-
tion, which uses WebRTC as a standardization framework. Furthermore, we discuss an ap-
plication design prototype proposed by Schulzrinne et al. [9]. We take motivation from this
prototype in order to implement our solution, which constitutes the subject matter of im-
plementation chapter.
We start the implementation chapter with the prototype section. In this section, we dis-
cuss the Linux container virtualization as an enabler of hardware -independent fast proto-
typing. We discuss Docker as an open platform, based on Linux container technology and
choose it as the orchestration engine for our prototypical application because of its robust-
ness, efficiency and ease-of-use features. Subsequently, we introduce and adapt the publish-
subscribe design pattern of the distributed systems and attempt to modularize our solution
into distinct components namely publisher, subscriber and mediator, with each having their
specific purpose. We specify the constraints on each of these components and illustrate on
our choice of underlying software frameworks and programming languages to implement
them. In the subsequent approach section, we describe two of our design approaches to
building the first functioning prototype of our solution. Thereafter, we proceed to environ-
ment and orchestration section to re-factor preliminary prototype with reference to practical
use cases. We deploy the publisher component in a Raspberry Pi device in a separate orches-
tration environment in the network. We then take notes on our observation of the applica-
tion behavior after the current revision along with associated challenges before evaluating
our application based on the predefined design goals.
The evaluation chapter encapsulates our assessment of the prototypical application based
on the proposed design goals of an IoT solution, which uses WebRTC as the standardization
framework. Furthermore, we dissect the behavior of the application, performance bottle-
necks during its execution along with our approaches to mitigate them. We reflect on some
of our design choices and propose respective alternatives before moving on to the conclusion
chapter.
Finally, in the conclusion chapter we retrospect on our work. We reflect on the limitations
of the similar works that we studied earlier and our approach to address those limitations.
We further appreciate the challenges we encountered in our implementation and propose

10 1. Introduction
potential mitigation strategies in order to improve on our work.

11
2 Foundation
Now that we set our objective of this study, we start with building our foundation for IoT
as a concept. We discuss the architectural specification, fundamental building blocks and
inherent challenges that occur while realizing an IoT solution.
2.1. Internet of Things
Atzori et al. [16] depicts IoT as the ubiquitous presence of smart objects around us and their
interaction with each other as well as with us. These smart objects, ranging from Radio
Frequency Identification tags, sensors or even mobile phones interconnect in order to ex-
change data and create opportunities with the potential of integrating the physical world
with computing-based systems. Together they contribute to the quality of everyday life,
efficiency, lesser human intervention and last but not the least economic benefits. These
developments are made possible by the advanced research work in Radio Frequency Iden-
tification (RFID) technology, smart sensors, and emerging communication and networking
protocols. A relevant example of domain-specific research work is the implementation of the
thermostat and Heating Ventilation and Air Conditioning (HVAC) in order to realize the idea
of smart homes. Some of the other areas that Internet of Things revolutionizes are health-
care, emergency fast response, transportation or industrial automation to minimize manual
labor, especially in hazardous environments.
Al Fuqaha et al. [4] mention architecture standardization as a key factor for the success of IoT.
In order to achieve this, the traditional Internet architecture needs to be revised. The growing
number of objects being connected to the Internet everyday needs the underlying network
protocols to be capable of handling the network bottlenecks. Evans et al. [17] mention in
their work that back in 2010 the number of connected objects on the Internet exceeded the

12 2. Foundation
total human population on earth. Case studies like this necessitate the need for evolution in
networking paradigms, as well as, the underlying security measures. When it comes to illus-
trate the impact of IoT from the commercial point of view we have some interesting statistics
to share from the work of John Gantz et al. [18] where researchers estimate that by the end
of 2020 the number of network connections and functional IoT smart objects around the
globe will exceed 212 billion. Moreover, according to the work of Stuart Taylor et al. [19] by
2022 up to 45 percent of the total Internet traffic are expected to be generated from machine-
to-machine interactions. According to the predictions, health-care and manufacturing IoT
applications industry will make distinguishing impact on the global economy. Therefore, in
this section, we make a substantial effort in order to illustrate the Internet of Things from the
architectural and specification point of view.
2.1.1. Architecture
IoT supposedly performs the heavy lifting of managing network traffic generated from inter-
connected functional smart devices beyond count. Hence, it needs a flexible layered archi-
tecture and abstraction. The exact architecture is not finalized despite many propositions
made and being made. One of the most prominent ones is IoT-A [20] . IoT-A attempts to
model a common architectural specification based on the needs of the research and indus-
try. From the works of Khan et al. [21], Yang et al. [22] and Miao Wu et al. [23] a basic
three-layer architectural model is proposed consisting of application, network and percep-
tion layers. Some recent works such as Atzori et al. [16] and Chaqfeh et al. [24] tried to add
more abstractions and proposed a five-layer model as shown in the figure 2.1.
Figure 2.1.: IoT layered architecture [4]

2.1. Internet of Things 13
We illustrate each and every layer from the study of Al-Fuqaha et al. [4] and Abdmeziem et
al. [25]. We start with an illustration of the high-level three-layer approach and transition to
the more granular five layer approach, which has been adopted in modern IoT applications.
The high-level approach of IoT is consisting of three layers namely, perception layer, network
layer, and application layer.
• Perception Layer
The main goal of the perception layer is to appreciate the physical properties of the
real world things around us with help of various sensing technologies. Another role of
the perception layer is analog to digital conversion of the signals, which makes signal
transmission over the network easier. In order to harness the potential of the physical
objects embedding them with microchips is necessary. Doing so converts the physical
objects to smart elements which play key roles in IoT applications.
• Network Layer
The most crucial responsibility for the network layer is to transmit the data collected
by the perception layer to the application layer over wired/wireless or local area net-
works. Some of the popularly used technologies are UMTS, LTE, Wifi, Bluetooth, Zig-
bee Infrared etc.. Moreover, a large amount of data needs to be stored and processed
preferably in a distributed manner making cloud computing a suitable solution.
• Application Layer
The processing layer makes use of the data gathered, transmitted and processed by the
previous two layers and acts as the front-end of IoT applications. This layer offers end-
less possibilities for the developers to make use of the potential of smart physical ob-
jects to create new services, e.g., smart transport, logistics or identification, location-
based services and lot more.
While the three-layer approach sets the stage and proposes possibilities for different IoT
applications, it’s more generic view doesn’t always help with planning, designing and im-
plementing a concrete IoT solution. Moreover, it does not always conform to the modern
service-oriented architecture paradigm for software development. Therefore, we now delve
into more granular five layer approach of the IoT architecture.

14 2. Foundation
• Objects Layer
The lowermost layer of the five-layer architecture that represents the actual physical
devices such as sensors or actuators which record data from physical events or states
such as motion, location or temperature , acceleration or vibration or weather condi-
tions etc.. The objects layer, also known as the perception layer gathers the data and
transfers it to the Object Abstraction layer. The objects layer is also generating a huge
amount of data and is a key enabler of the Big Data paradigm.
• Object Abstraction Layer
The second layer from the top down approach of the five-layer architecture is responsi-
ble for acting as a bridge between the underlying objects layer and the service manage-
ment Layer. The object abstraction layer is capable to encapsulate various networking
and connection standards such as GSM, UMTS, WiFi, Bluetooth etc. in order to se-
curely transfer the data to processing units. This layer could also provide distributed
cloud data management abstraction mechanism and act as a temporary host of the
data before it is processed.
• Service Management Layer
This layer is also sometimes known as the middleware, which provides the neces-
sary abstraction between the heterogeneous data source and the software services that
consume the data in order to generate useful knowledge. Application and service de-
velopers often need to use various kinds of datasets without being concerned about a
specific hardware platform. This layer provides that abstraction to the developers and
lets them develop business logic without being concerned about the data source.
End users communicate with this layer in order to consume services. This layer ex-
poses various services in form of applications on several platforms and software ecosys-
tems. Manufacturing high-quality smart devices are made possible by introducing this
layer. The real world implementation of this layer is spread across various market and
industries, e.g. smart home, smart city, health care, or transportation to name the few.

• Business Layer
the application layer needs control and management capabilities for real-world as-
pects like monetization. Last but not least, we have the business layer that manages
the overall systems and creates a business model for offered services. This layer of-
ten manages the statistical work such as creating graphs or charts in order to visualize
the market trends and service consumption that contributes to the optimization of the
system from a business perspective. This layer can also be a direct consumer of the
application layer services. One popular example would be big data analytics.
The five-layer approach is capable to provide a specification and standard where end users
can consume the relevant data in the form of various application and web services with-
out being concerned about the underlying complexity and heterogeneity of the data source.
Furthermore, the five-layer approach accommodates business layer specification that is re-
sponsible for analytics and business models in order to improve the quality of services and
create new business opportunities. With the understanding of the core architecture, we now
switch our focus to the fundamental building blocks of IoT.
2.1.2. Building Blocks for IoT
Al-Fuqaha et al. [4] describe the fundamental building blocks of IoT as elements and illus-
trate their criticality in realizing the IoT services as depicted in the figure 2.2. In this study
we illustrate them individually.
Figure 2.2.: IoT Fundamental Building Blocks [4]
• Identification
Al-Fuqaha et al. [4] describe the need of matching IoT services with their demand.
Identification is one of the most crucial services in this aspect. We need to correlate be-
tween the ID and address of a given smart object in a given network. The ID uniquely

16 2. Foundation
identifies a given smart object in an IoT orchestration and the address provides its lo-
cation in the network. This is also within the scope of identification services. There are
several options for identifying objects in a given network. Gubbi et al. [26] illustrate
RFID as a promising option. It makes use of the electronic product code in embedded
microchips and they are used in automatic identification for any smart objects they
are attached to. The passive RFID tags make use of power obtained from the signal be-
ing radiated from the RFID reader to make them power-efficient. We can observe the
substantial use of passive RFID tags in transport systems. On the contrary active RFID
tags make use of their own battery power and they are used considerably in places like
cargo containers.
• Sensing
Gubbi et al. [26] depict that researches on power-efficient integrated circuits and wire-
less networks enable the development of cost and energy-efficient smart sensors. These
sensors are capable to collect, analyze and often send a large amount of useful infor-
mation collected from various environments to data warehouses. Al-Fuqaha et al. [4]
further emphasize that these sensors can come in many form factors such as small-
scale camera sensors or even wearable devices like smartwatches. These sensors are
capable to record many kinds of useful information for processing and enabling value-
additive services. Some of the prominent examples are traffic camera sensors, heart-
rate tracker, smart-watches etc.. Gubbi et al. [26] assert that active RFID tags are
also considered as small-scale sensors with limited processing capability and storage.
Some of the more prominent and efficient solutions are created with single board com-
puters e.g. Arduino or Raspberry Pi with integrated network protocols.
• Communication
Al-Fuqaha et al. [4] describe that the communication technology used to realize the
IoT services can connect a large number of heterogeneous devices to present a cu-
mulative solution using an unreliable network. The core communication technologies
widely used in IoT scenario could be Bluetooth, Long Term Evolution (LTE), WiFi, RFID
or even Near Field Communication (NFC). Each of these communication technologies
has its own characteristics.
RFID is one of the key enablers of the machine to machine interaction in IoT scenario

and implemented through a radio frequency readers and RFID tags. Usually, radio
frequency readers query for any tag signal in a given location and uniquely identifies
the particular smart object from the received signal from the tag with the help of a
database. Compared to RFID technology, NFC is suitable for a short range and low
bandwidth data transfer.
According to Ferro et al. [27] one of the most prominent communication technology
providing high bandwidth and supporting longer range is Wifi. It relies on the radio
waves [4] and provides a comparatively reliable communication mechanism for smart
objects. On the contrary, Bluetooth relies on short wavelength radio technology to effi-
ciently use the power of smart devices. More recently, the Bluetooth Low Energy (BLE)
technology became an enabler of many location-aware smart IoT services. So far we
considered statically bound smart devices but modern IoT is not limited to static ob-
jects. We also need to consider the mobile to mobile communication scenario which
leads to LTE [28] technology. According to Ghosh et al. [29] LTE Advanced extends
the bandwidth and supports fast traveling mobile devices with low latency and high
throughput value. Furthermore, in the ongoing year 2018, the 5G specification of cellu-
lar communication is tested across the globe with the expectation of 5G enabled smart
devices to be launched from the year 2019 onwards. 5G promises the bandwidth as
high as 10-20 Gbps with even lower latency in data transfer. These increased capabili-
ties in data transmission should trigger the deployment of a new breed of IoT applica-
tions, which feature rapid data collection and transmission over the network. Thus we
observe that existing and upcoming communication technologies are likely to work as
enablers of several smart services, making themselves backbones of modern IoT.
• Computation
The impression that we developed about IoT so far indicates that the paradigm re-
lies on computing systems, which consumes less power and operate with limited re-
sources. Al Fuqaha et al. [4] explain that micro-controller and microprocessor-based
operating system make the core of IoT. Some of the most common examples in this do-
main are Arduino, Raspberry Pi, Intel Galileo etc.. Another key factor is obviously the
real-time lightweight operating systems powering these devices. Al Fuqaha et al. [4]
provide some of the prominent examples of these operating systems such as Contiki

18 2. Foundation
Real Time OS, which is capable to use its built-in simulator to help researchers simu-
late wireless sensing based application behavior. The effort of companies like Google
to introduce new capabilities to the Android software ecosystem to support smart ve-
hicular networks is also worth a mention [30].
The impact of cloud computing paradigm over modern IoT is substantial. Often smart
sensors produce a huge amount of data over time, which are usually sent to data ware-
house application hosted in clouds. The data is analyzed in the cloud in real time in
order to generate useful knowledge. The cloud provides scalable solutions and com-
puting resources that are necessary to analyze this big data and often provide decision-
making capabilities. Thus computation is yet another key operational part of IoT.
• Services
The major IoT services can be classified broadly into four categories namely, identity-
related, information aggregation, collaboration-aware and ubiquitous. In this study,
we throw some light on these major services.
Identity-related services are considered to be one of the basic building blocks of other
mentioned categories of services. Using a large number of smart physical objects in
IoT is quite common and most often they need to be uniquely identifiable in a given
area, which is taken care of by the Identity-related services. In IoT, we often use smart
sensors across a given environment whose primary responsibility is to gather a huge
amount of raw data such as weather conditions or medical situations which need to be
analyzed later to gather useful knowledge. This information gathering generally is part
of information-aggregation services. Collaboration-aware services encapsulate the act
of processing the data and generate useful knowledge for decision making. Ubiqui-
tous services attempt to make the collaboration-aware services available virtually ev-
erywhere. One of the major goals of IoT is to reach the ubiquitous service stage but
in practicality, most of the IoT implementations fall under the first three categories.
Now that we have a foundational idea on the potential service categories for the IoT
we discuss some prominent implementations.
Cook et al. [31] describe the smart home implementation and its impact on improving
the quality of lives. In a smart home environment, several internal appliances such
as heating systems, air conditioners and external entities such as smart grid systems

inter-operate in order to create a partially autonomous improved living condition for
human beings. Yongfu et al. [32] discuss that transportation cyber-physical System
has the potential to bridge between the computing and the communication systems
to modernize the transportation. Al Fuqaha et al. [4] discuss that conceptually a typical
intelligent transportation system encapsulates four main subsystems namely, vehicu-
lar network, base station, intelligent transport system monitoring and last but not the
least security subsystem. Many pioneering automobile and software companies such
as Audi, Tesla or Google have started researching on next generation of self-driving
smart automobiles leveraging the capabilities of IoT.
From the illustrations of Al Fuqaha et al. [4], Atzori et al. [16] and Gubbi et al. [26] we
mention some of other prominent service scenarios as industrial-automation, smart
health-care, smart grid systems and smart cities . Industrial automation is about im-
proving operations and productivity, mainly focusing on factors such as sensing, com-
munication, processing, and transportation. Smart health care fields are using smart
clinical sensors to monitor the patient’s health conditions and analyze clinical data in
real time to arrange medical care. Smart Grid systems use IoT to connect the energy
meters from large areas to the networks of energy providers enabling them to monitor
the consumption and take rational decisions on resource management with respect
to demands. Smart grid systems promote the efficient consumption of energy and re-
duce wastage. Smart city projects leverage many of the services we have described so
far and attempt to improve the overall quality of life for inhabitants by making useful
information and services available in an efficient and convenient manner.
• Semantics
Semantics in this context refers to the intention behind IoT. Al Fuqaha et al. [4] define
semantics as an act of generating knowledge, which enables various kinds of services
and it requires to discover resources which could act as a source of useful information
and model the same. Therefore, Al Fuqaha et al. [4] describe the semantics to form the
core of IoT. The semantics building block is supported by web technologies such as the
Resource Description Framework (RDF).
Eric Miller in his work [33] describes RDF as an infrastructure to exchange, encode
and re-use structured meta-data. Specified on top of XML, the resource description

20 2. Foundation
framework defines the methods to express the semantics. RDF provides both human
readable and machine processable terms to express the semantics. Additionally Al
Fuqaha et al. [4] discuss about the adoption of Efficient XML Interchange (EXI) by
W3C. Optimization features built into EXI reduces the required bandwidth converting
XML messages to binary thereby saving the power, memory and storage consumption
for processing.
Now that we grabbed the essence of fundamental architecture and basic building blocks of
IoT, in the process we realize that IoT creates the possibility for a new generation of ubiqui-
tous applications and services. Udoh et al. [5] describe that real-world objects embedded
with capabilities such as sensing, identification, communication, and computation can ex-
pose useful information, help in a decision-making process or provide other services of ben-
efit. But these possibilities are not beyond challenges. According to Patel et al. [34] two of
the major challenges are the difficulty to define roles for various components in IoT imple-
mentation and the scarcity of frameworks to address the challenge posed by heterogeneity
in IoT. Nguyen et al. [35] discuss that another notable challenge is using an appropriate pro-
gramming abstraction in different technical layers of IoT implementation confers.
Udoh et al. [5] describe that a standard framework is capable to address the major challenge
of heterogeneity and complexity of a distributed system paradigm and at the same time per-
form the heavy-lifting of Big Data handling and define a standard architecture along with the
implementation. In following subsections, we review IoT as technical ecosystem along with
its standard requirements followed by key challenges and associated recommendations.
2.1.3. Technical Ecosystem of IoT
In this section we have a glimpse of the technical ecosystem of IoT from the study of Udoh et
al. [5]. We discussed some of these technologies before, therefore in this section, we will re-
view where these technologies fit in before discussing the associated requirement and chal-
lenges.

Figure 2.3.: IoT Technical Ecosystem [5]
Figure 2.3 shows some of the enabling technologies of IoT as per respective building blocks
and we will briefly illustrate this technical ecosystem starting from so-called physical layer.
• Physical Layer
As we mentioned before our physical world is constituted with various objects which
can be embedded with smart microchips to convert them as a source of data. These de-
vices could be categorized depending on the intended tasks, which they perform. One
of the widely popular categories is sensors, which are primarily responsible for con-
verting a physical parameter to a digital input. There are various technologies which
are tried and tested to realize sensors. Udoh et al. [5] present the RFID as the key tech-
nology for the identification of physical objects. Some other mechanisms for identifi-
cation proposed, are Ubiquitous Codes and Electronic Product Codes [36].
• Communication Layer
Identification needs help from addressing mechanisms encapsulated in the communi-
cation layer because they alone can not uniquely identify a physical object in a global
context. Udoh et al. [5] propose the IPv6 addressing scheme along with IPv6 Low-
power Wireless Personal Area Network (6LoWPAN) as appropriate addressing mech-
anism for IoT devices with limited processing capacity. Sensing as a building block,
bridging between the physical IoT objects and the virtual world of computation is in-
herently dependent on the communication layer. Li et al. [37] discuss that use of Wifi
wireless sensor networks in smart grid, smart environmental protection, and agricul-
tural application scenarios is beneficial.

22 2. Foundation
Likewise the network protocols we discussed, session and application protocols are
also key components of the communication layer. While HTTP over TCP is the gen-
eralized protocol for the Internet, Yokotani et al. [38] explain notable limitations for
standardizing HTTP in IoT scenarios. Physical objects like sensors often gather a huge
amount of data and transfer the same as tiny chunks over the network. HTTP is im-
plemented over TCP, which adds the overhead of the ISO/OSI layer stack. Message
fragmentation can happen, which results into the reassembly of fragments in the re-
ceiver end. Furthermore, HTTP is prone to the head of line blocking. All these factors
are considerable overhead for physical objects with limited memory and processing
capacity. Therefore, Message Queuing Telemetry Transport (MQTT) proposed in [39]
is viable alternative for the IoT scenario. As per Yokotani et al. [38] MQTT reduces the
protocol overhead and provides an efficient communication mechanism for IoT. It re-
lies on so-called publish/subscribe paradigm for message passing. Data chunks are
represented by topics which are transmitted over the publish message and received
as a result of the subscribe message for a given topic. Contextually, Message Queu-
ing Telemetry Transport for Sensor Networks(MQTT-SN) was proposed by IBM [40]
especially for devices with resource constraint. MQTT-SN brings some optimization
upfront over MQTT, which makes it suitable for IoT communication session protocol.
• Platform Layer
Subsequently, we have the platform layer which encapsulates a key component i.e.
middleware. Udoh et al. [5] describe that middleware works according to a Service Ori-
ented Architecture (SOA) paradigm and provides necessary abstraction to encounter
the challenges like heterogeneity, interoperability, security and context-awareness. Fur-
thermore, the emergence of platform layer capabilities such as Big-Data Analytics or
Fog Computing is a necessary step towards achieving ubiquity and integration of use-
ful IoT services into daily lives.
Last but not least the application layer encapsulates the capabilities and services that
end users interact with. Facilities like smart home, smart grid, smart city, industrial
automation, smart transportation, etc. find their place in this layer.

Now that we discussed the technological ecosystem empowering IoT paradigm we need to
understand that having a clear demarcation among the layers are troublesome to achieve.
Udoh et al. [5] mention that any given component in the ecosystem is not tightly bound to
any particular layer, instead a given software component or capability can be found to be
integrated within multiple layers. As an example, a given application component could be
consumed by the end users at the application layer as a standalone service and at the same
time, it could be part of the network edge, participating in orchestration to create another
more complex service. In order to ensure Quality of Service, IoT implementation must be
quality-driven and associated with factors like security, reliability, adaptability, etc.. In the
next subsection, we would illustrate these requirements [5].
2.1.4. Requirements
Based on the technological ecosystem we discussed in the previous subsection, here we il-
lustrate some essential quality requirements for IoT applications from a broader perspective
referring Udoh et al. [5].
• Security
Privacy, authenticity, confidentiality, and integrity are core values of data security. Suo
et al. [41] have explained, why security should be considered as one of the top priorities
in IoT broadly because of the reasons below.
– IoT extends the traditional network paradigm to a complex orchestration of mo-
bile and sensor networks.
– A huge number of unorthodox physical objects are likely to be connected to the
Internet and will engage in communication.
These new paradigms and objects will generate a huge amount of data creating an
unprecedented challenge for data security and integrity in services being offered to
end users.
• Adaptability
IoT systems consist of constellations of heterogeneous smart objects from the physical

24 2. Foundation
world inter-operating over the network. They vary in terms of state and capacity to
process data. Data gathered from different smart objects is used to develop different
categories of services and applications. This heterogeneity alone makes IoT systems
highly dynamic, which necessitates the need of being adaptive and capable to recover
from disaster and faulty state with minimal manual intervention and function under
different circumstances. IoT systems must be capable to effectively react with respect
to changing context and circumstances [5].
• Intelligence
Kortuem et al. [42] explain in their studies that emerging technologies such as loca-
tion awareness, sensor networks are transforming the physical objects into smart ob-
jects, which in turn are the technological building blocks of IoT. These smart objects
should be able to make context-aware decisions. When we talk about such decision
making, we should mention another category of smart objects which are actuators.
Actuators are responsible for converting the digital signals to physical output. Often in
the practical IoT scenarios, data gathered by sensors are digitally processed for knowl-
edge generation and decision making and the actuators then interact with the physical
world to materialize the digital decision. In order to achieve this, smart objects should
be equipped with intelligent mechanisms such as complex event processing and data
analytics [5].
• Timeliness
Nowadays researches attempt to implement the IoT paradigm in time-sensitive use
cases such as the vehicular network to avoid road accidents, traffic situations or med-
ical conditions monitoring in health-care scenarios, where delay tolerance is very low.
Hence IoT must be capable to deliver timely services without a considerable delay.
• Regulation Compliance
The advancement of projects like smart homes or cities created a unique privacy con-
cern. Smart sensors located in smart homes often a collect massive amount of data
about the environment. This data could take various forms e.g., energy usage at a man-
aged environment and patterns of availability for dwellers at smart home, which could
be used to build an approximate profile of individuals and their lifestyles. In breach of

data security, these data could be used by malicious third parties to cause significant
harm. This possibility of a potential breach of privacy must be addressed. Regulations
must be in place to control the individual’s data collections and regulate how and in
what different ways the data are used for knowledge generation and smart decision
making to realize the IoT paradigm.
Adhering these requirements along with the quality of service guarantee in IoT systems fea-
turing heterogeneity and complexity is not an easy task to achieve and poses several chal-
lenges. In the next section, we discuss some of the prominent challenges that IoT systems
must address.
2.1.5. Challenges
In this subsection, we illustrate some of the challenges of IoT systems as explained by Udoh
et al. [5], posed by the inherent requirements and various technical specifications.
• Inherently Distributed
IoT systems are distributed by nature. We can imagine that in a complete IoT solution,
some of the resource-intensive core processing capabilities might be offloaded to the
centralized cloud while at the same time relatively lightweight tasks, such as data col-
lection would be performed by several smart sensors or actuator objects. Furthermore,
the applications that end users are supposed to interact with needs to be developed for
web or mobile platforms. Moreover, one IoT system could act as a building block for
a more complex and larger system. Developing such a distributed system without a
centralized approach is a challenging task.
• Deep Heterogeneity
Interoperability among heterogeneous components in IoT systems is one of the key
challenges to overcome. This heterogeneity may stem from not only the diversity in
smart objects and nature of the communication networks but also from the vendors
and manufactures with varying standards and Quality of Service (QoS) promises. Since
a complete IoT solution often relies on several different components throughout its
separate layers of architecture, heterogeneity is a very ground-level challenge that needs

26 2. Foundation
to be dealt with.
• Data Management
IoT is one of the key enablers of the big data paradigm. Smart objects collect a huge
amount of data over time and transmit it over the network for centralized or distributed
processing to generate knowledge. For instance, the illumination sensors or heating
sensors installed in a smart home environment might record data on time and dura-
tion the respective systems are active. And this data could be a source to derive an
accurate pattern on the lifestyle of the dwellers. This pattern could be useful in cus-
tomizing the smart home environment to save energy consumption. Accumulating
and transmitting such data also poses security and privacy risks. In order to derive an
accurate pattern for such use cases, the data must be uncontaminated. Hence, achiev-
ing data confidentiality, integrity and availability are potential challenges for IoT sys-
tems.
• Application Maintenance
Over time several scientific surveys are made to predict and quantify the growth of IoT.
One of the popular ones is Gartner is [43] survey predicting that by 2020 the number of
connected smart devices on the Internet would be 26 billion. With the steady growth
and the distributed nature of IoT systems and smart objects deployments, inherently
comes the issue of maintenance and debugging and additional privacy and security
concerns. Therefore, large-scale distributed deployment and maintenance is a chal-
lenge for IoT.
• Human Factor
Researchers always envisioned one of the key promises of IoT is to make the quality of
life better for human beings. Projects like smart home/cities are directly projected to-
wards that direction. With such visions, we need to understand that integrating smart
objects in the daily lives of human beings itself is a challenge. Smart objects cannot
always coordinate with the diverse physical or psychological traits of living beings.
Therefore extensive study and research are needed to effectively blend in the smart
objects in human lives to make them useful.

• Application Interdependency
So-called interdependency problem as a probable situation when several systems rely
on the same data source to generate independent knowledge or outcome. Udoh et
al. [5] further explain the problem with a scenario in a smart home setting, where
the smart energy regulation and a smart health-care monitoring system rely on the
same sensors readings to save underlying resources like network bandwidth. These
two systems can come up with the separate decision after processing the same data.
They are independent systems and therefore free to make their own decisions with
separate assumptions. A situation is possible when there is no motion detected from
the dwellers and energy regulation system might decide to turn the lights off while
the health-care monitoring systems decide to turn the lights on with a prediction of
depression. Thus these two systems may end up with contradicting decisions even
after processing the same dataset. Countermeasures for such contradictory situations
should be in place.
• Diverse Stakeholders
The heterogeneity in building blocks and review of the technical ecosystem for IoT re-
veals that several stakeholder parties are involved in the life cycle of the IoT application
from development to deployment and maintenance such as network administrators,
application developers, hardware, and security experts so on. Components built or
maintained by one party is needed by another party. Thoughtful coordination is there-
fore advised among all associated stakeholders.
• Quality Evaluation
The growth of IoT implementations and execution of innovative ideas in the past sev-
eral years in many domains made the ongoing integration of IoT in human lives promi-
nent. The IoT paradigm is being used to make sensitive and often life-saving decisions.
Such a high degree of reliance on IoT demands some quality evaluation before a sys-
tem is made operational. The main challenge on that regard is several components
inter-operate in order to shape a complete IoT solution, where performance bottle-
necks could be caused by any component in the pipeline. Performance evaluation of
IoT solutions is, therefore, a challenge and topic of research.

28 2. Foundation
We observe that the IoT paradigm brings some inherent challenges and open issues with it,
which are often crucial to address in order to provide a reliable solution. Thus far, we built
an idea on the diversity of domains and use-cases where the IoT paradigm is integrated and
practiced, with many more coming into existence in near future. Potential mitigation of all
the aforementioned challenges could stem from standardizing an appropriate framework.
Choosing the right framework helps in dealing with the heterogeneity of hardware or phys-
ical devices. It addresses the complexity of distributed computing with necessary abstrac-
tions, e.g., middleware, manages classified information efficiently in large-scale and helps to
design, implement and last but not least deploy full-scale IoT solution. A great number of
diverse frameworks are proposed including WebRTC, each having their own constraints.
2.2. WebRTC
In this section, we introduce the WebRTC standard and analyze its components and speci-
fications. We introduce both the classic WebRTC API based on the HTTP over TCP and the
newly proposed API based on QUIC, which is envisioned as the successor of HTTP for next
generation of high-level networking protocol.
Web Real Time Communication (WebRTC) is a new standard for extending the web brows-
ing model. It enables the browsers to exchange the real time media information over P2P
connections. The World Wide Web Consortium (W3C) and Internet Engineering Task Force
(IETF) have defined the standards and APIs to setup and manage a reliable communication
channel between modern web browsers.
The classic web communication is inherently based on a client/server paradigm, where the
client makes an HTTP request to the web server for some content on which the server re-
sponds with the content if it is available. The resources provided by the server are closely
associated with a uniform resource identifier. Typically, in the case of web application, the
server can embed some JavaScript code behind the HTML resources to be executed. This
code may use standard API to make different services available on the browser. WebRTC
extends the client-server approach with a P2P communication paradigm. WebRTC draws
some inspiration from the so-called SIP-Trapezoid [6].

2.2. WebRTC 29
Figure 2.4.: SIP Trapezoid [6]
In this WebRTC trapezoid model browsers in separate systems are running a web application
powered by different web servers. Signaling messages are set up and managed over HTTP or
web sockets. One needs to understand that signaling between the browser and the server
is not a standard protocol in WebRTC. And for the so-called media path a P2P connection
is established for direct communication between the browsers without the intervention of
a web server. The two web servers as we see in the figure 2.4 can communicate using a
standard signaling protocol, e.g., SIP.
However, the more common WebRTC scenario is the one, where browsers in separate sys-
tems are running a web application hosted on a single web server. Here we transition from
trapezoidal to triangular model as shown in figure 2.5.
Figure 2.5.: SIP Triangle [6]

30 2. Foundation
Before going deeper in our study with WebRTC, we should appreciate the role of the web
browser in this network-based orchestration. Traditionally, WebRTC based web applications
are designed with a blend of HTML and JavaScript to interact with the browser engine and
exploit standardized WebRTC API to support real-time browser functions. Thus WebRTC
inherently relies upon several native capabilities of modern web browsers as shown in the
figure 2.6
Figure 2.6.: WebRTC browser reliance [6]
The WebRTC API needs to support the considerably large set of functions such as NAT, P2P
connection management, media encoding, and decoding capabilities and control, firewall
and so on. We also need to focus on the fact that WebRTC relies on a continuous flow of data
across the network with no further intermediaries, which is challenging and at the same time
a potentially game-changing approach in web-based communication.
2.2.1. The Session Initiation Protocol
Before going to the architecture layer of WebRTC we take a moment and appreciate the Ses-
sion Initiation Protocol (SIP) which has a key role to play in traditional WebRTC implemen-
tation. As per Kurose and Ross [44] SIP is a lightweight application-layer control protocol
that can establish, manage and terminate multimedia sessions such as Internet telephony

2.2. WebRTC 31
calls.
• SIP provides the mechanism for establishing the call thereby letting the caller notify
the callee about the intention to start a call along with the necessary media encoding.
• It provides the capability to determine the network address of the callee at a given
point in time.
• It provides the management control and capabilities over the call such as adding a new
media stream or inviting new participants during the call and so on.
As per Rosenberg et al. [45] SIP supports five aspects of multimedia communication.
• User Location: Determine the end system for the communication.
• User Availability: Determine the willingness of the callee to engage in communi-
cation.
• User Capabilities: Determine the media constraints and parameter.
• Session Setup: Also known as ringing, deals with setting up session parameters at
both ends of a call.
• Session Management: Encapsulates the transfer and termination of sessions, manag-
ing session parameter, and services.
At this point, we would like to appreciate the fundamental idea that SIP is not a fully featured
communication system itself. SIP is just a key component that can be used in conjunction
with other IETF protocols to build a capable multimedia application architecture.
2.2.2. WebRTC Architecture
As per WebRTC’s official documentation [7] figure 2.7 describes the overall architecture of
WebRTC.

32 2. Foundation
Figure 2.7.: WebRTC Architecture [7]
The Web applications are generally those which are developed by third-party developers,
that use the video and audio capabilities offered by the WebRTC API for real-time commu-
nication. Apart from Google Hangout some of the most popular network communication
driven applications such as Facebook Messenger, Discord, Amazon Chime etc. are promi-
nent examples using the WebRTC architecture to their backbone.
The WebRTC API layer encapsulates a wide range of utilities that application developers can
use to develop their real-time multimedia communication-based application. We will em-
phasize on this layer in greater detail in the upcoming section.
The WebRTC Native C++ API is of particular interest for the browser creators, who expect
higher control over low-level system architecture and optimization possibilities for better
performance. This layer helps in building the Web API proposals, as well as, functions as the
software interaction layer which connects it to the underlying hardware components such
as a webcam or a microphone.
This brings us to the point, where we want to discuss the three key pillars forming the un-
derlying backbone of the architecture.
• Transport / Session encapsulates components from Google’s libjingle sdk, which pro-
vides the framework and capabilities to build P2P applications across heterogeneous

2.2. WebRTC 33
networks. This layer includes protocol and specifications for real-time communica-
tion. Apart from that it provides the Session Traversal Utilities for NAT (STUN), as well
as, Interactive Connectivity Establishment (ICE) utilities to aid the establishment of
Peer-to-Peer (P2P) connection across various kinds of networks which takes firewall,
relay servers and proxies into account. This layer works as an abstraction of the session
layer from the Open System Interconnection (OSI) model that helps to establish and
manage calls.
• The Voice Engine is the backbone of the audio component that is being shared be-
tween the caller and callee. This layer provides support for narrowband and wideband
codec for Voice over IP (VoIP) and streaming audio. It also supports constant and vari-
able bit-rate encoding to maintain the audio quality and provide best possible refine-
ment over the noise and loss in audio.
• The Video Engine is the backbone of the video component like the voice engine for
the audio. The video engine supports the codec that enables the video encoding and
decoding in low latency. It also alleviates the negative effect of packet loss due to poor
network situations and enhances the video frames for quality video communication
experience.
2.2.3. WebRTC API
According to Loreto et al. [6] the WebRTC API is inherently designed to extend the browser
functionalities to provide audio/video media stream. All media streams are encrypted using
Data Transport Layer Security (DTLS). DTLS provides a layer of security over the connection-
less User Datagram Protocol (UDP). It attempts to prevent eavesdropping, message forgery
or tampering. IETF is working to set a minimum standard for the video and audio codecs
that must be supported by all WebRTC wrappers and frameworks in order to ensure the in-
teroperability among separate web browsers. We will discuss the three main components of
the WebRTC standard API.

34 2. Foundation
MediaStream
MediaStream is an abstract representation of actual audio or video data stream and allows
to manage the functions that we can perform on the stream such as displaying in the fore-
ground, recording or transmitting over a P2P connection. It can represent both the outgoing
and the incoming stream of audio and video.
Typically, we can get access to the local media stream captured by webcams or microphones
by executing the JavaScript getUserMedia() method on the MediaDevices interface on the
navigator object of the supported web browsers. The MediaDevices interface provides ac-
cess to any hardware source capable to provide media related data. This process requires the
user’s permission to share the native media and it also takes into account, what kind of data
we are requesting, e.g., audio, video or a combination of both. MediaStream maintains the
audio and video streams separately with tracks that are encapsulated by the MediaStream-
Track object.
For the transmission, UDP is used but we also need to introduce some sort of reliability to
the media transmission. For this purpose we wrap the UDP packet inside a Secure Real-time
Transport Protocol (SRTP). The Real-time Transport Control Protocol (RTCP) is used to carry
the transmission statistics. DTLS is used to encrypt SRTP packets and aid the secure session
management.
Loreto et al. [6] describe that in multimedia communication mediums are generally carried
by RTP sessions, which have dedicated RTCP packets as well to aid the reliability and error
handling. One downside of this approach is to have a separate hole through Network Address
Translation (NAT) using STUN for each dedicated media stream that involves separate ports.
Work is being done to mitigate this challenge by accommodating multiple streams between
a given caller and callee into a single Real-time Transport Protocol (RTP) session.
PeerConnection
PeerConnection is the WebRTC API component that allows two users to communicate over
the browser. From the programming perspective, it represents the logical association with
the remote peer which is essentially another instance of the same program as the local peer.
The connection establishment between two peers is handled by a process called Signaling

2.2. WebRTC 35
with the help of XMLHttpRequest or Web Socket. At this point, we are going to illustrate the
idea of signaling.
As per Loreto et al. [6], WebRTC’s a concept has always evolved around dealing with man-
agement of the media stream without concerning much about the signaling process which
is mostly delegated to the application layer. Application developers are free to come up with
their own mechanism to deal with the signaling, e.g., using SIP or Extensible Messaging and
Presence Protocol (XMPP) or several other possible options. The most significant piece of in-
formation that needs to be exchanged as part of the signaling is the session description. The
Session Description encapsulates the constraints concerning media type, format, nec-
essary encoding strategy, and several other parameters. It also specifies the transport ICE
information on which we are going to elaborate in a while.
Traditionally, the Session Description Protocol (SDP) was the standard way to exchange ses-
sion information. It has been observed that SDP has certain challenges. Typically it uses the
media description field to mention all kinds of media which are agreed to be exchanged be-
tween peers. SDP also allows peers to mention a time frame required to construct a media
packet. This time frame impacts the construction of packets for all kinds of media men-
tioned in the media description field. Apparently, we can not specify the differing packet
construction time for different kinds of media payload. However, the actual time required
may differ for audio and video media stream packets and mentioning one fixed time for
packet construction in the SDP may affect the quality of the media at the receiver end, con-
fers M. Willekens et al. [46].
Application developers may, however, use proprietary mechanisms to specify the time to
construct packet per media type payload but this could very well lead to interoperability
problems. In order to mitigate such problems IETF attempts to standardize the JavaScript
Session Establishment Protocol (JSEP) as an alternate standard for SDP
. JSEP provides more
control to the application developer to drive the signaling mechanism. It translates the ses-
sion description and related ICE information to messages of the chosen signaling protocol
the application developer wants to use.
Before going further, we need to understand the role of NAT in WebRTC orchestration and
impact of the ICE on the same. Rosenberg et al. [47] specify that a number of WebRTC
clients are typically behind a NAT firewall and since the use of the UDP is prescribed, any

36 2. Foundation
media stream will encounter issues with NAT traversal limiting the capabilities of WebRTC.
The countermeasure of this problem is that the clients engaged in the communication use
a NAT traversal algorithm. Therefore, the PeerConnection component uses ICE protocol in
conjunction with STUN and Traversal Using Relay NAT (TURN) strategies.
STUN helps a host application to discover its public IP address while being behind the fire-
wall. This address is provided to a second host who wants to establish a connection with the
first host application. In order to achieve this, a configured third-party STUN server is nec-
essary for the public network. When this strategy to establishing a connection fails another
service called TURN comes to the rescue. TURN helps a host application behind a NAT to get
the destination public IP address and port associated with the connection from a relay server
configured in the public network. Thus, STUN and TURN together make it possible for po-
tential peers to determine the network address to reach each other and establish the peer
connection. The ICE protocol helps the peers to identify the network in between and de-
termine the optimal path to establish the connection. Furthermore, ICE provides a security
mechanism that prevents unexpected attacks or security threats from the public network.
An intermediary web server assists the potential peers to exchange the signaling messages.
Standard JavaScript real-time communication library offers RTCPeerConnection and RTC-
SessionDescription API to create a PeerConnection object and enable the bidirectional
data exchange.
DataChannel
The DataChannel API relies on P2P connection to exchange various kinds of data between
the web browsers supporting WebRTC. According to Loreto et al. [6] IETF prescribed the use
of the Stream Control Transport Protocol (SCTP) guarded with DTLS to exchange the generic
data types apart from multimedia streams.
We have discussed in the PeerConnection API component the use of UDP to wrap the bytes
of data and the ICE protocol, which provide a lightweight solution for data transmission even
between the peers, which are located in private networks behind the NAT firewalls. By us-
ing SCTP together with DTLS for security on top, UDP makes an efficient and secure data
exchange possible. One of the most exciting advantages with this strategy is that peers can
exchange generic non-multimedia data along with media streams using the same port num-

2.2. WebRTC 37
ber. A perfect real-world example would be a Skype application where peers can exchange
text messages, various file types while being on a video call with the recipient at the same
time.
We need to appreciate the fact that SCTP makes itself a viable choice because of its capabil-
ity to support multiple data streams. It can manage multiple streams within a single SCTP
session with a considerable sense of reliability. Loreto et al. [6] describe a stream as a unidi-
rectional channel that supports a sequential flow of arbitrary data packets. The DataChannel
API is capable to operate bidirectionally and it encapsulates an incoming and an outgoing
SCTP stream. The JavaScript standard library offers RTCDataChannel API to work with se-
cure DataChannels. Typically, a DataChannel is created by calling the createDataChan-
nel() method on RTCPeerConnection object.
So far we presented a classic overview of the WebRTC architecture and API that it offers for
the web developers to create web-based real-time communication applications. However, in
the recent past, some notable advancements took place in Internet demographic and usage
as a whole. IETF came up with new paradigms for Internet communication to achieve better
performance and security. Naturally, such research works have an impact on WebRTC.
2.2.4. Emergence of New Communication Paradigms
Until now we discussed that WebRTC inherently relies upon the UDP
. This is because TCP
brings the additional overhead of three-way handshaking before establishing the connec-
tion. This makes the TCP connection establishment procedure slower compared to UDP
.
However, apart from being a slower transport protocol, TCP has a significant advantage over
UDP such as reliable message delivery, built-in congestion control and flow control mecha-
nisms which carry some crucial factors in Internet communication. Therefore, the classical
HTTP protocol heavily depends on TCP
. Thus, we can imagine that an ideal choice of the
transport protocol for WebRTC would be something which allows us to establish the con-
nection faster and at the same time providing the added advantages that TCP generally of-
fers. The emergence of HTTP 2.0 and QUIC make us reconsider the connection-oriented
approach for WebRTC. Before we could elaborate over the same, we need to explore HTTP
2.0 and QUIC in greater details.

38 2. Foundation
HTTP 2.0
Google developers [48] and developer advocate/author Ilya Grigorik [49] describes the pri-
mary goal of HTTP 2.0 as following,
• Reducing latency by implementing the request and response multiplexing.
• Compressing the HTTP header field to reduce protocol overhead.
• Supporting request prioritization and server push.
HTTP 2.0 focuses on how the data is formatted and transported between the client and
server. Hence the low-level complexities remain hidden from the application layer with the
help of a new framing layer for necessary abstraction. In order to achieve the desired perfor-
mance HTTP 2.0 introduces a binary framing layer having issues with backward compatibil-
ity. Hence, the implementation of the new HTTP 2.0 called for a new version number. Before
we contextually illustrate the HTTP 2.0, we need to discuss the SPDY protocol because of
their close relationship.
SPDY is an experimental protocol, which was developed mainly to lower the page loading
time in the browser. Nowadays SPDY is mainly used for experimental purposes before in-
troducing it as a new definitive feature into HTTP 2.0. SPDY is said to be using TCP as the
underlying transport layer to make itself compatible with the existing networking infrastruc-
ture. Implementing SPDY requires the client’s browser and web servers to support SPDY pro-
tocol specification and the existing web content can still remain operational as before. SPDY
adheres to the Secure Socket Layer (SSL) to provide security and enables the possibility for a
server to communicate back to the client.
SPDY introduces a session layer over SSL to provide multiple concurrent data streams over a
single TCP connection. With SPDY specifying a new framing format, existing HTTP method
constructs remain the same. Two definitive features of the SPDY protocol are Server Push
and Server Hint.
• The Server Push feature of SPDY enables the server to initiate a stream directed to-
wards the client to offer useful content that the client might need without asking for it.
This is made possible by introducing an associated content header in the SPDY proto-
col.

2.2. WebRTC 39
• Server Hint enables SPDY to make smart decisions about, which content the client
might need and provides this hints to the client side to speed up the information de-
livery and convenience. In order to provide the Server Hint feature, SPDY makes use of
the sub-resources content header.
These features not only make it possible for SPDY to offer the required information to the
client faster but also it immensely helps to reduce the flow of packets in the client-server
communication to save network bandwidth.
Now that we introduced the SPDY protocol specifications, we need to understand the key
promises of HTTP 2.0. As per Google developers [48], on the top of the key promises of the
SPDY protocol, HTTP 2.0 enables prioritization of requests thereby allowing more crucial
requests to be served with priority, improving the overall browsing experience for the end
user. Following are the key features of HTTP 2.0 at a glance.
• Binary Framing Layer
– Newly introduced Binary Framing Layer defines the new way of encapsulating
and transferring the HTTP messages. It provides a design choice to encode the
HTTP header components into smaller messages instead of plain text like HTTP
1.x. Thus all HTTP 2.0 communication relies on the exchange of smaller messages
encoded in binary format.
– This results in for both client and server to understand the same binary encoding
mechanism in order to make the communication possible and this is also the
reason why HTTP 2.0 is not inter-operable with HTTP 1.x. An HTTP 1.x client
won’t be able to communicate with HTTP 2.0 only server.
• Stream, Messages and Frames
– HTTP 2.0 defines a stream as a bidirectional byte flow within an established con-
nection, a message as a logical sequence of frames and frame as the smallest unit
of communication having a frame header identifying the parent stream. These
logical definitions enable HTTP 2.0 to carry an arbitrary number of concurrent
streams identified by a stream identifier over a single TCP connection.

40 2. Foundation
• Request-Response Multiplexing
– The ability to break down a single message to several concurrent interleaving
frames and reassemble them on the server end in order to retrieve the message is
a definitive feature of HTTP 2.0.
– This approach significantly attempts to mitigate the blocking scenarios for re-
quest and response. The ability to use a single connection to deliver multiple
request and response reduces latency and boosts utilization of the network’s ca-
pacity.
• Stream Prioritization
– The ability of HTTP 2.0 streams interleaving creates this possibility of assigning
a stream with a weight and/or dependency upon another stream. This combina-
tion of weight and dependency allows the client to create a priority tree for the
requests it sends. This structure allows the server to determine how the client
would like to receive the response and prioritize the associated streams accord-
ingly.
– Combination of stream weight and dependency mapping attempts to establish a
transport preference. It is not a guarantee. Clients cannot enforce the processing
order of streams or the order of streams in response to servers, which is again an
expected behavior.
• Flow Control
– As per Google developers [48] HTTP 2.0 provides simple building blocks for flow
control and defers the implementation to the client and the server to implement
the strategies and resource usage.
– HTTP 2.0 prescribes a directional flow control which was established when client
and server made the initial connection and it can not be disabled. Flow control in
HTTP 2.0 is regulated on a hop-by-hop basis where intermediate nodes use the
strategy to control resources.

2.2. WebRTC 41
• Server Push
– In addition to the resources the client has requested for, the server can provide
additional information that might be helpful for the client. This opens a new
world of possibilities and break-through in traditional client-server interaction.
– Server Push is initiated by the server with the PUSH-PROMISE frames, which de-
scribe the possibility for the server to push extra information. Upon receiving the
offer a client can make the decision to accept or reject it. This makes the client be
in control of the server push mechanism. The server push mechanism also cre-
ates individual streams for individual resources, which can be regulated like any
other stream in HTTP 2.0.
• Header Compression
– Google developers [48] describe the request and response header related changes
as follows. The HTTP 1.x handled the HTTP headers in textual format, which
added overhead of around 500 to 800 bytes per transfer. HTTP 2.0 has proposed
a compression format called HPAC to mitigate this overhead.
– HPAC compression format prescribes to encode the header with Huffman coding
which reduces the size of the header as well as adds some out of the box security
attributes to the header. However, it dictates the client and server keep track of an
indexed list of previously transferred header fields, essentially creating a shared
state of header information between client and server which helps in decoding
and retrieving the header information at the receiver end.
At this point in this discussion, we need to appreciate the standpoint of HTTP over TCP in
the traditional network stack as a default standard. HTTP is pretty much the backbone of
the traditional Internet as we know it. HTTP 2.0 has promising mitigation plans for some of
the core challenges of the TCP based HTTP 1.x generation network protocols. However, no
matter how promising HTTP 2.0 sounds, it is still not completely free of all TCP bottlenecks.
Moreover, all of the existing web applications are somewhat constrained by the available
features of the present standard network stack. Lasse Lumiaho et al. [50] argue that our pre-
vious statement alone is the motivation for many organizations to explore the new frontier,
which often deals with connectionless UDP
. Pioneers in the research effort with UDP-based

42 2. Foundation
network protocol believe that the development of UDP-based network protocol may lead
to even quicker connection establishment, improved error and flow control and leverage
the research work of decades on the traditional network stack. Following figure 2.8 shows a
standardized network stack as proposed by quick.
Figure 2.8.: Standardized stack of QUIC [8]
QUIC
Chromium Project by Google in collaboration with IETF [51] describes Quick UDP Internet
Connections (QUIC) as the self-contained next-generation transport protocol for Internet
communication built on the top of UDP
. The key promises of QUIC over TCP+TLS+HTTP 2.0
include faster connection establishment, forward error correction, and congestion control,
multiplexing without head-of-line blocking and connection migration.
In most of the interactions QUIC does not require a handshake before sending the payload to
the server apart from the first communication between client and server. Earlier in the incep-
tion phase, QUIC used to feature QUIC Crypto as the security mechanism during handshak-
ing before connection establishment. The main feature of QUIC Crypto was for the client to
cache the information about the server on the first interaction to establish encrypted con-
nections for the subsequent interactions without any further round-trips. With the advent
of TLS 1.3 which provides a more efficient similar feature QUIC Crypto has been replaced by
TLS 1.3 as the standard initial handshaking and security mechanism for QUIC.
Upon connection initiation one round trip of handshaking is required where the client sends

2.2. WebRTC 43
an empty hello (CHLO) message to the server and the server sends a reply with rejection(REJ)
message to the client with the source address token and the server certificate, which are
generally cached at the client end. For all subsequent communication between server and
client, the cached information could be used to avoid further needs of handshaking and
speed up the communication.
QUIC offers a pluggable congestion control mechanism compared to TCP with an imple-
mentation of CUBIC, which is an efficient congestion control mechanism designed for net-
works featuring high latency. CUBIC is a systematic variant of TCP
, which uses the conges-
tion window as a cubic function of time since the last congestion event. The QUIC imple-
mentation of congestion control generates a piece of richer state information compared to
traditional TCP. One prominent example would be that QUIC assigns a new sequence num-
ber to both original and retransmitted packets, which makes it possible to distinguish the
ACK for the original packet from ACK for the retransmitted packet. This mechanism allows
mitigating the retransmission ambiguity problem of traditional TCP
.
Both classical HTTP and TCP suffer from a drawback called the head of line blocking. For
HTTP 1.0 each client or browser has a limited number of connections to a server. In order
to make a new request to a given server from a browser, the older connection must be com-
pleted first. HTTP 2.0 proposed a solution over this issue with the possibility that a browser
can issue a new request over the same connection without waiting for the previous one.
However, HTTP 2.0 suffers from its own head of line blocking issue caused by the underlying
TCP level. The application sees a TCP connection as a stream of bytes. If a TCP packet is lost,
no packet in HTTP 2.0 connection can move forward until the lost packet is retransmitted
and received at the receiver end.
QUIC has promising features to address this drawback of the HTTP 2.0. In QUIC, each stream
is independent. Hence, a lost packet in a given stream only affects that specific stream. Each
frame in the stream can immediately be dispatched on arrival. Hence, the loss-less stream
can function as usual.
Forward Error Correction (FEC) is a measure adhered by QUIC in order to aid a fast recov-
ery from packet loss scenarios without waiting for the retransmission of packets. QUIC can
attach an FEC packet with a specific group of packets. FEC packet contains the parity of
all packets in the FEC group of packets. In the event of a packet loss, the contents of the

44 2. Foundation
packet could be recovered with the help of an FEC packet. FEC packets can be sent with high
priority packets in a low bandwidth environment to optimize the communication.
For classical TCP scenario, a connection is typically identified by source and destination ad-
dresses as well as source and destination ports. This means if a client changes the IP address
for example by changing the network, port association table in the NAT mechanism is reset
and TCP connections are immediately invalidated. On the contrary, QUIC connections are
identified by 64-bit connection identifiers, which is typically randomized by the client. This
means if the source and destination address or source and destination port numbers change,
QUIC can continue the connection with the help of a connection identifier.
Now that we appreciated the emerging paradigms of the modern network communication,
we need to connect the dots. Our present discussion is about the impact of these emerg-
ing network paradigms on WebRTC. Ian Swett made an informative presentation [52] on the
potential coalition of QUIC and WebRTC. We have discussed earlier in this section, how the
WebRTC DataChannel API relies upon the SCTP and DTLS to securely exchange data be-
tween the web browsers. QUIC shares several behaviors with SCTP and DTLS in context of
WebRTC, as we describe in the table 2.1 below.
Similar Behaviors Summary
Multiplexed Protocol QUIC supports considerably large numbers of concurrent data
streams with TLS 1.3, which goes hand in hand with the conven-
tions of the SCTP secured by DTLS.
Flow Control QUIC supports stream level and connection level flow control
and does it more efficiently compared to HTTP over TCP
, which
is preferable for DataChannel exchange of audio and video data.
Encryption QUIC messaging is always encrypted and secured by Transport
Layer Security which is required feature for audio,video and ar-
bitrary data transmission.
Out of order delivery Unlike TCP QUIC does not suffer from the head of line blocking
but we have to detect the out of order anomalies at the receiver
end and take the countermeasure. This makes QUIC play well
with real time communication scenarios.
Table 2.1.: Similar behavior that QUIC shares with SCTP and DTLS

2.2. WebRTC 45
Jesup et al. [53] describe the advantages of using UDP in WebRTC DataChannels. Normally
in WebRTC, the media data is transported over SRTP and non-media data is entrusted to
SCTP
. The encapsulation of SCTP/DTLS over UDP/ICE provides a unique NAT traversal fea-
ture and confidentiality which works hand in hand with the media transport over SRTP using
a single UDP port number. SCTP provides native support for multiple streams and for both
unreliable and reliable data channels. In the everyday scenario we choose the unreliable
data channels to transport the non-critical information and reliable data channel otherwise.
Ian Swett in his talk [52] explained how QUIC has the potential to lay a positive impact on the
WebRTC compared to traditional SCTP
. We can summarize those impacts here in accordance
with the Internet-draft published by A. Joseph et al. [54].
• Connection Establishment: SCTP involves a 4-way round-trip handshaking proce-
dure along with digitally signed cookies to prevent denial of service attack. While being
secure, this procedure causes some delay, specially converted to latency for multime-
dia communications.
QUIC connection establishment starts with version negotiation to verify if both ends
of the communication are using the same QUIC version. Version negotiation is fol-
lowed by cryptographic handshaking. QUIC is able to reduce this two-step procedure
into one step, potentially nullifying the need for a round trip. Thus connection estab-
lishment is faster in QUIC contributing towards lower latency.
• Reduced Size of Header Frame: Connection oriented WebRTC using SCTP over DTLS
imposes an overhead of large header sizes of around 60 bytes. This could especially
be a challenge for audio communication. QUIC header, often called a short header,
needs around 32 bytes, which includes the UDP packet header makes the packet trans-
mission significantly lightweight saving bandwidth especially for multimedia commu-
nication.
• Multiplexing/Sub-Streams: We discussed the inherent head of line blocking issue of
TCP
. SCTP attempts to mitigate this issue with some added challenges. In SCTP Trans-
mission Sequence Number (TSN) used to identify one data chunk somehow relates
to all other data chunks.On the other, each packet hand can have data chunks from
several streams. If one stream data chunk is lost, other streams remain unaffected.

46 2. Foundation
Therefore, when the acknowledgment is received the TSN implies the last received
data chunk by the receiver. The issue is if one TSN is lost from the acknowledgment
all data chunks with later TSNs cannot be received by the recipient until the lost TSN
is retransmitted.
QUIC, on the other hand, identifies packets with stream id, an offset value, and an
increasing packet sequence number. All lost packets are retransmitted by assigning a
new packet sequence number. Thus the loss of a packet does not affect the packets
sequenced after it. This potentially addresses the head of line blocking system.
• Fragmentation: SCTP relies on a message defined by the application. This leads to
the overhead for SCTP to fragment the custom application specific message to data
packets.
QUIC, however, supports the bidirectional byte stream model which goes hand in hand
in terms of compatibility with the existing TCP based HTTP 2.0. This makes things
easier for the existing applications to migrate to QUIC.
• Reliability and Congestion Control: Reliability of a network is measured by the pos-
sibility of guaranteed delivery of every message sent. On the other hand, congestion
control is defined by the fact that how a sender controls the message send limit in order
to avoid the clogging in the network which results in packet loss. Both QUIC and SCTP
provide their own implementation of the reliability and congestion control mecha-
nism. SCTP mostly borrows its congestion control strategies from TCP
. On the other
hand, QUIC borrows some idea from TCP as well as implements its own congestion
control strategies.
• Flow Control: Flow control is similar to the congestion control with the main differ-
ence that in flow control we focus on the receiving endpoint more than the network.
It is the limit in which a receiver advertises its potential to accept the incoming data
in order to prevent packet loss. SCTP deals with the flow control mechanics depend-
ing on its association between the sender and the receiver. However, in QUIC flow
control mechanics can be controlled both from connection and sub-stream perspec-
tive. Two of the connection parameters namely, MAX DATA and MAX STREAM DATA are
settled during the connection establishment and thereafter both endpoints which are
engaged in the connection must abide by the values set in those parameters.

2.2. WebRTC 47
Now that we made a comparative argument in favor of QUIC, we point out some of the spe-
cific areas in WebRTC which QUIC could considerably improve.
• Ian Swett [52] mentioned that unlike SCTP QUIC provides the possibility of cancel-able
streams. SCTP streams are completely reliable due to the underlying TCP which is also
the cause of the overhead. QUIC introduces the unreliability using UDP in order to get
rid of the overhead but it is still not fully unreliable.
• QUIC has the XOR-based FEC features.
• With the SCTP we had the option to use CUBIC or TCP RENO as congestion control
algorithm inherent to TCP. These are not ideal choices for real-time media data con-
sumption as they involve the added risk of introducing unwanted latency.
• QUIC connections are identified by 64-bit connection identifiers. Hence, during the
communication, if one of the communicating parties or both changes their network
interfaces, QUIC connection can still function without being concerned about the NAT
table rebinds.
With this discussion concerning the advent of QUIC in a WebRTC scenario, we set the stage
for introducing the new API for WebRTC using QUIC as proposed by W3C.
2.2.5. Proposed WebRTC API for QUIC
Peter Thatcher and Bernard Aboda provided a definitive specification of the proposed API
for WebRTC based on QUIC [55]. QUIC provides the possibility of being multiplexed on the
same ports with all traditional WebRTC protocols such as RTP
, RTCP or DTLS. This allows
QUIC to play nicely with traditional implementations of WebRTC for multimedia as well as
arbitrary data communication. In this section, we illustrate the proposed API in substantial
detail, which is still in the process to be finalized by IETF QUIC Work Group.
2.2.6. RTCQUICTransport Interface
RTCQUICTransport Interface encapsulates the browser-based real-time communication and
associated configurations using QUIC as the transport protocol. RTCQUICTransport inter-

48 2. Foundation
face is comparable with the RTCPeerConnection interface of the traditional specification of
WebRTC. RTCQUICTransport can be instantiated with the RTCIceTransport and sequence
of RTCCertificate objects. RTCIceTransport object encapsulates the information about the
ICE transport layer. This layer deals with the transmission and reception of data and exposes
invaluable information about the state of a P2P connection at any given point in time. With
this specification, the handshaking between two separate peers engaged in communication
takes place with the help of an ICE specification.
Loreto and Romano [6] described the need of exchanging the network reachability informa-
tion between the engaging parties in order to make it possible for establishing data streams
between them. This process of exchanging the network reachability information is encapsu-
lated by the ICE framework. Essentially, ICE helps the peers to discover each other’s presence
in the network before establishing the communication. Local ICE agents are responsible for
performing activities such as,
• Gathering information about local IP addresses and port numbers.
• Connectivity check results between potential peers.
• Sending messages to keep the connectivity alive.
Loreto and Romano [6] further discuss that post establishment of local session description,
the local ICE agent performs several activities,
• Gather information about the local IP addresses for potential peers.
• Once the callee is selected, ICE agent queries the public IP address and reachable port
number to the callee from the STUN server.
• TURN server is also used as a fall-back strategy i.e. in case of connectivity establish-
ment failure with the intervention of the STUN server, the TURN server is used.
The RTCQUICTransport Interface could maintain one of several states encapsulated by the
RTCQUICTransportState enumeration type. All possible states are described in figure 2.9 as
state machine diagram.
As we mentioned before, RTCQUICTransport is instantiated with RTCIceTransport object.
Upon instantiation, RTCQUICTransport interface assumes the state new. In this state lo-

2.2. WebRTC 49
cal peer has configured its own parameters of the P2P connection which is encapsulated by
the RTCQUICParameters dictionary. These information is obtainable by invoking the get-
LocalParameters() method which returns a RTCQUICParameters object. In this state, the
peer is ready to process the incoming packets but unable to send any outgoing ones until the
remote fingerprint verification is complete. These remote fingerprints are part of RTCQUIC-
Parameters object from the remote peer. The peer enters into the connecting state with
the invocation of start() method. In this state, local peer enters into a negotiation phase
with the remote peer. Upon successful negotiation, the local peer could get the transport
information from the remote peer with the invocation of getRemoteParameters() method,
which in turn returns another RTCQUICParameters object from the remote peer. A success-
ful negotiation results in the connected state. In this state local peer completes the remote
fingerprint verification with the remote peer and both peers can send and receive data pack-
ets. While being in the connected state, the peer may invoke stop() method and put an end
to the P2P connection and enter the closed state. Any error in the communication can alter-
natively lead to the failed state. While being in the closed or failed state, any invocation of
the start() method by the peer only results in the InvalidStateError and does not lead
to any change in the state. In the closed or failed state the RTCQUICTransport object finishes
its lifetime and is destined for garbage collection.
new
Initial connecting connected
closed
failed
start()
stop()
start() -> InvalidState
start() -> InvalidState
Figure 2.9.: RTCQUICTransport object lifetime as state machine representation

50 2. Foundation
RTCQUICParameters Dictionary
Thatcher and Aboda [55] describe that RTCQUICParameteres dictionary encapsulates the
information about the QUIC configurations which include the role of a given peer as well as
the transport information. RTCQUICParameters dictionary typically, has a RTCQUICRole
property and a sequence of RTCDTLSFingerprints properties.
RTCQUICRole defines the role for a given peer in P2P communication. It is yet another
enumeration type which could assume three roles as described in the table 2.2 below.
Role Description
auto Role is determined based on the ICE defined role. Typically the
ICE controlled role is designated as the QUIC client and ICE con-
trolling role is designated as QUIC server.
client RTCQUICTransport object is designated to have client role.
server RTCQUICTransport object is designated to have server role.
Table 2.2.: RTCQUICRole specification
Determination of RTCQUICRole needs some illustration as well since an application often
needs to determine its intended role. When instantiated, RTCQUICTransport object readily
assumes the role auto especially, for a browser based application. Alternatively, the method
getLocalParameters() of RTCQUICTransport object can be used to determine the accu-
rate role of local peer when RTCQUICTransport is in connected state. Similarly, the appropri-
ate role of the remote peer can be determined from the method getRemoteParameters().
2.2.7. RTCQUICStream Interface
RTCQUICStream object, as the name suggests, encapsulates the information about a given
stream established over QUIC connection. It is comparable with the MediaStream object of
the traditional specification of WebRTC. It also is closely associated with the RTCQUICTrans-
port and the stream is instantiated by the createStream() method of RTCQUICTransport
object.
RTCQUICStream interface designates the following properties and methods as described in

2.2. WebRTC 51
tables 2.3 and 2.4 below.
Properties Description
transport Points to the RTCQUICTransport object, a given RTC-
QUICStream object is related to.
state Designates the state of the RTCQUICStream object
which we are going to illustrate right after.
onstatechange Defines the consequences of the state change of RTC-
QUICStream.
readBufferedAmount Represents the number of bytes buffered to be read
since the last event loop.
maxReadBufferedAmount Represents the maximum number of bytes that can be
buffered.
targetReadBufferedAmount Represents the target number of bytes in the read buffer
which maintains the back pressure on the sender upon
reading the target number of bytes.
writeBufferedAmount Represents the number of bytes of application data that
has been queued using the write method.
maxWriteBufferedAmount Represents the max number of bytes of application data
that the implementation allows to be queued by write
method.
Table 2.3.: RTCQUICStream properties specification
Methods Description
readInto Reads from the incoming RTCQUICStream buffer,
which it takes as its first argument and returns the
number of bytes read. When done with reading all
data bytes, return zero. If the read buffer is empty
and in case of end of life situation returns a nega-
tive number.

52 2. Foundation
write Writes data bytes to the outgoing RTCQUICStream
and queue data for transport. The data transmis-
sion happen in a concurrent manner and if any er-
ror is encountered, an error event is generated and
application is notified asynchronously.
finish Designates the end of the RTC data transmission
and de-initialization for the RTCQUICStream object
and carry out the cleanup procedure.
reset Reset the RTCQUICStream object state and starting
with carrying out the similar procedure as the finish
method.
waitForReadable Resolves the promise or in other words fulfills the
purpose when the incoming data in the read buffer
exceeds a given threshold amount. If the specified
threshold is not exceeded the promise will neither
be resolved nor rejected.
waitForWritable Resolves the promise or in other words fulfills the
purpose when the outgoing data in the write buffer
becomes lesser than a given threshold and as long
as the outgoing data does not fall below this thresh-
old the promise will neither be resolved or rejected.
setTargetReadBufferedAmount Sets the threshold for the read buffer of RTCQUIC-
Stream object which represents the number bytes
RTCQUICStream can receive at a time.
Table 2.4.: RTCQUICStream methods specification
The RTCQUICStream interface could maintain one of several states encapsulated by the
RTCQUICStreamState enumeration type. All possible states are described in the figure 2.10
as state machine diagram.
The initial state encapsulated by the RTCQUICStreamState is new, which means the outgo-
ing stream transmission is not started yet by the local peer because the DTLS fingerprints of

2.2. WebRTC 53
the remote peer are not yet available to the local peer. The next state is called opening. In this
state the DTLS fingerprints verification with the remote peer is complete and the outgoing
streams are queued for transmission. Then comes the open state where the stream trans-
mission is started. In this state invocation of the stop() method of the RTCQUICTransport
object by any of the peers engaged in the communication could lead to closing state. In the
state closing, RTCQUICTransport object of the peer starts to close down the stream. Finally
the state closed means the stream transmission is ended.
new
Initial opening open closing closed
start() stop()
Figure 2.10.: RTCQUICStreamState as state machine representation
We conclude our discussion of the proposed QUIC API of WebRTC with the consideration
that all communication under the QUIC implementation should be regarded as public in-
formation and the confidentiality is provided by the cryptographic negotiation using TLS
version 1.3.
So far in our foundational study we introduced IoT as the key player in the modern net-
working demographic. IoT has a number of challenges over the traditional web applica-
tions paradigm with its inherently distributed nature and heterogeneity in underlying com-
ponents. Distributed nature contributes to increased complexity in data management and
added security concerns. We observed that the standardization of the framework could be
one crucial factor in mitigating these challenges.
In this current chapter, we introduced WebRTC as one of the proposed frameworks which
have the potential to simplify P2P communication and to exchange arbitrary data. It presents
itself as a possibly optimal fit for the communication backbone in a complex IoT solution.
We dissected its individual components and attempted to look into their internal specifica-
tions. We discussed the traditional and modern specifications of WebRTC. In the next chap-
ter, we illustrate the required design specifications for an appropriate standard IoT frame-
work and analyze whether WebRTC fits as a viable option.

54 2. Foundation
2.3. Proposed Design Specification
In the previous sections, we prepared the ground by introducing the IoT as a paradigm
and delved into its technological building blocks, architecture, associated challenges, and
its potential implementation areas. Furthermore, we discussed the core concepts of We-
bRTC along with its classical API specification based on HTTP over connection-oriented TCP
and with the newly proposed API specification based on the emerging QUIC protocol over
connection-less UDP.
We also illustrated that modern IoT implementations are inherently distributed in nature,
where physical objects are embedded with sensors, which gather a huge amount of data for
centralized or distributed processing and knowledge generation at a later point in order to
make smart decisions. This paradigm is often termed as computation follows data. In
this section we reflect on an alternative paradigm characterized as data follows compu-
tation which necessitates the real-time communication of data and control flow. Moreover,
we make a justifiable attempt to propose WebRTC as a viable framework to achieve it.
Schulzrinne et al. [9] illustrated that in the current generation of IoT implementations the
application or service is usually deployed on the cloud. Physical objects gather data and
transmit it to the cloud for knowledge extraction. Cloud offers infrastructure, platform, pro-
cessing power and storage as managed services, which are scalable and optimized for per-
formance. On the other hand, some of the shortcomings of this approach are the necessity
of steady network bandwidth, low latency for data transmission, security concerns.
In order to mitigate these challenges a distributed architecture is proposed. This architec-
ture features the deployment of processing capability nearer to the origin of data, possi-
bly in the network gateways nearby or on the sensors objects. Some concrete examples of
this approach could be Cloudlets or Mobile Edge Computing (MEC) implementations. Re-
searchers identified some noteworthy advantages of this new paradigm such as data security,
service reliability and the possibility of creating more value-added applications reusing the
data gathering and distributed real-time processing capability. But to design prototypes that
achieves these goals is a task, easier said than done. The scarcity of standard frameworks is
considered as one of the noteworthy reasons. In the next sections, we illustrate a proposition
to address this issue.

2.3. Proposed Design Specification 55
2.3.1. Design Goals for a Standard Framework
We start with specifying the capabilities that we expect from a suitable framework which can
rapidly model distributed applications, referring to the work of Schulzrinne et al. [9].
• P2P Communication
The framework should be able to support direct P2P communication. Once the remote
session is established from both ends, devices should be able to perform application
layer tasks. P2P connection is a crucial requirement for the applications where the
state of both engaged parties may depend on each other.
• Fast Hardware-independent Prototyping
Heterogeneity in terms of hardware and software frameworks is a crucial challenge
to implement IoT. Researches found that a distributed IoT application is basically or-
chestration of several nodes having their own environment, state and processes. These
nodes communicate with each other through message passing. The framework should
be able to support application prototyping having this heterogeneity abstracted away,
possibly through virtualization.
• Support for Web-Based Applications
Rapid growth and innovation in the web-based applications made the web browser a
critical platform for designing and developing new ideas. Modern browsers such as
Google Chrome, Safari or Firefox, which support high-level object-oriented program-
ming languages and protocols are sufficient to support time-critical applications. The
potential framework should also support the mobile platforms.
• Component Based
The framework should be extensible via the augmentation of additional features de-
veloped by third-party developers in the shape of packages. The framework should be
capable to support the use of such packaged features seamlessly.

56 2. Foundation
2.3.2. Network Architecture of The Prototype and Session
Establishment Capabilities
In light of the design goals we discussed, we now attempt to illustrate the network architec-
ture of the proposed prototype. Furthermore, we describe a scenario for establishing com-
munication in order to explore its capabilities.
Network Architecture Prototype
Figure 2.11 depicts an IoT network architecture prototype involving heterogeneous devices.
Figure 2.11.: IoT Network Architecture Prototype [9]
In the figure, relatively thinner lines in the architecture denote traditional Web Socket con-
nections for signaling and the thicker lines denote dynamic WebRTC P2P connections using
RTCDataChannel APIs. The network also features four device types such that,
• Device Category A represents mobile devices with JavaScript enabled browsers ca-
pable to support the WebRTC protocol stack. Usually, these devices are used to imple-
ment interactive panels and dashboards.
• Device Category B represents the IoT devices typically powered by some flavor of
Linux and capable to support the WebRTC protocol stack using native libraries.
• Device Category C represents generic computational unit such as server node run-
ning with some virtualization technology and hosts several virtual IoT objects.
• Device Category D represents a device which might need to connect to other net-
worked devices without supporting the WebRTC protocol stack.

2.3. Proposed Design Specification 57
Supporting Services encapsulates required services e.g., STUN, TURN, Authentication Au-
thorization Accounting (AAA). The Device Registrar is another service, which is responsi-
ble for tracking connected devices in the network and maintaining active connections. Last
but not least the Proxy Server makes it possible for devices to establish the connection.
Considering most basic use-case scenario for the aforementioned network architecture, any
two given device nodes should be able to bidirectionally communicate using any given net-
work protocol prescribed by the application. For devices which are directly connected to the
network can leverage the P2P signaling mechanism provided by WebRTC without any hassle
as we have discussed earlier but for the devices configured behind NAT, STUN or TURN ser-
vices need to act as intermediaries for setting up the communication. All signaling processes
irrespective of WebRTC or Web Socket connections are coordinated by the device registrar.
Associated signaling mechanism and setup procedures are abstracted away from the IoT ap-
plications as needed.
Session Establishment Capabilities
Finally, we illustrate a session establishment scenario for our proposed network prototype.
Fette et al. [56] described the Web Socket protocol as the enabler of bidirectional commu-
nication between a remote server and a client on a controlled networked environment. The
Web Socket protocol is triggered by a handshaking mechanism over TCP and applicable for
browser-based web applications where creating multiple HTTP requests is not preferred. In
this illustration, we observe the Web Socket protocol taking a pivotal role in connection es-
tablishment.
A given device node having the Web Socket request capability registers with the registrar
by sending a register message along with an authentication token. Upon verifying the token
via a separate AAA service, registrar enables the communication on a respective persistent
connection.
Once two such nodes have joined the network, they are capable to initiate a WebRTC peer
session between them. In order to do this node first exchange an SDP offer and answer mes-
sages between them followed by an ICE candidate exchange message.
Typically, after establishing the P2P connection both peers can exchange video and audio

58 2. Foundation
streams. In order to exchange arbitrary data, an additional data channel is required. Two
peers can maintain indefinite numbers of simultaneous data channel connections.
Figure 2.12.: Connection Establishment for devices without a WebRTC Stack [9]
Furthermore, Schulzrinne et al. [9] illustrated the connection establishment procedure for
the devices which does not support the WebRTC protocol stack, depicted in figure 2.12.
These devices trigger connection establishment by sending an HTTP Connect request to the
proxy server with a unique identifier for the target device. The proxy server maintains the
secure bidirectional communication channel with the target device and in turn mediates
between the source and target during the communication sessions.
So far in our study, we described the core concepts of IoT as a paradigm along with its fun-
damental building blocks. We reflected on its inherent challenges and proposed WebRTC as
a candidate for the standard frameworks in order to address them. We explored both tra-
ditional and emerging specifications for WebRTC. Thereafter, we discussed an IoT solution
prototype proposed by Schulzrinne et al. [9], which consists of several heterogeneous nodes
and explored its orchestration capabilities. Therefore, we set the ground to take motivation
from the proposed prototype and build an IoT prototypical application using WebRTC which
conforms to the design goals of an appropriate standardization framework for IoT as well as
overcome the limitations of other related works that we discussed earlier.

59
3 Implementation
In this chapter, we describe our approach to building a prototypical IoT application, which
features the WebRTC as the standardization framework. We take motivation from the ar-
chitecture proposed by Schulzrinne et al. [9] and build our solution which conforms to the
design goals of the aforementioned standardization framework. It should also address the
limitations we highlighted from our review of the related works.
3.1. Prototype for Our Solution
While discussing the design goals of a standard framework for IoT solutions we illustrated
that heterogeneity of underlying hardware and software need to be abstracted away in order
to support fast prototyping. This could be achieved with the orchestration of independently
distributed nodes in a virtual environment. Besides, virtualization offers attributes like avail-
ability, isolation, and security. Despite these benefits virtualization often suffers from some
performance bottlenecks as conferred by Regola et al. [57]. For our prototyping purpose, we
need a lightweight solution to spin up the application components faster in an environment
constrained by resources. More recently developed of Linux container technology works as
a lightweight alternative for traditional hypervisor-based virtualization methods [10].
Linux container technology allows us to distribute the physical machine resources among
multiple isolated contexts of processes. Therefore, container-based lightweight virtualiza-
tion functions with abstraction at the level of system processes while hypervisor-based vir-
tualization functions at the level of hardware with an abstraction of guest operating systems
[10]. Each container appears as an independent tree of system processes. The figure 3.1 be-
low depicts the contrast between hypervisor-based and container-based virtualization tech-
niques.

60 3. Implementation
Figure 3.1.: Contrast between container-based and hypervisor-based virtualization [10]
The isolation of processes in Linux container-based virtualization is achieved using the Ker-
nel namespaces while the resource management is dealt with using control groups mech-
anisms inherent to Linux. Each container run within the scope of its namespace and its
access is limited within its namespace. Control groups allow the Docker engine to share the
computational resources to containers and enforce necessary constraints. Some of the early
implementations of Linux container-based virtualization mechanisms are Linux Container
(LXC), OpenVZ and Solaris Containers. In our study, we want to introduce Docker as an open
platform based on container-based virtualization.
3.1.1. Docker As Open Container-based Virtualization Platform
Docker is an open platform that helps in running the logical units of an application in iso-
lated, self-contained environments along with their respective dependencies. We describe
the basic building blocks of Docker as follows [11].
• A Container is referred to as the most basic building block and a unit of a software
package. Containers encapsulate all dependencies for a given component of the ap-
plication to run. Containers are instantiated from respective images and containers
maintain the state of the containerized application component. Since they represent
an abstraction at the application layer being executed as an isolated environment any
attack on the container from the outside would make it less likely to affect the machine
kernel. Therefore, deploying application components in containers are assumed to be

3.1. Prototype for Our Solution 61
safer.
• An Image is a stateless template that encapsulates the recipe to create container out
of it. Often Docker images are created from another base image and in that case the
base image and the derived image share some of their layers. Sharing of layers among
various images makes Docker suitable for fast application prototyping and delivery.
Recipe for creating an image is provided in the form of a recipe file called Dockerfile.
• A Service is the scaling mechanism of Docker which can operate across multiple Docker
daemon processes. Daemons usually communicate among each other using Docker
API. Services maintain the state of the containerized applications and can be distributed
across several Docker daemon nodes. From the perspective of consumers, service ap-
pear as single application [11].
Figure 3.2 describes client-server architecture of Docker platform in its operational state.
The Docker Daemon is the long-running process which manages the docker basic building
blocks e.g. images, containers, networks or volumes.
Figure 3.2.: Docker architecture [11]
Docker client and daemon could be operational on same or separate hosts and they com-
municate to each other over a REST API which listens typically on Unix socket connection or
network interfaces.
Docker Compose is an orchestration tool offered by Docker to deploy application involving
multiple containers on a single host machine. Docker-compose uses a YAML configuration
file in order to define and set up the services we want to execute in the containerized en-

vironment. Docker-compose offers simple command sets in order to build and deploy the
application in Docker environment.
Docker-compose needs us to define the build environment of its individual services, typi-
cally with the help of Dockerfile and this makes the application to be dynamically executable
in any platform executing a Docker Engine where docker-compose toolset is installed. Ser-
vices must be configured fully with all its required components which makes them be exe-
cutable in an isolated manner often interacting with each other over defined port numbers
which can be configured either in YAML configuration or in Dockerfile. Docker-compose has
several features described below which make it a suitable tool for fast prototyping as briefly
discussed below [11].
• Docker-compose uses project name property to isolate multiple environments. This
is useful to differentiate between different projects, which use the same service names
in a single host. It can also be used to spin multiple copies of the same environment
for testing different features of the application.
• It maintains information about the volumes used by services upon creating a con-
tainer. Volumes are said to be the way to persist data inside a container and they are of-
ten used to share data among multiple containers. Upon execution, Docker-compose
can detect if a particular location designated as volume was used by previous execu-
tions of the containers and is used by presently executing containers it copies the vol-
ume along with its underlying data to the new container.
• One particular reason for Docker-compose to be suitable for fast prototyping is it cre-
ates a new container only when needed. Docker-compose is able to cache the config-
uration for containers used in a given execution and on each subsequent execution it
reuses the existing containers without creating new ones. New containers only created
upon encountering a change in the configuration from the previous execution.
• Docker-compose allows environment variables in order to customize the orchestra-
tion of different environments or containers.
Docker-compose offers a way to define and configure dependencies, e.g., databases, queues,
caches, web APIs etc.. for all of the underlying services that it uses to execute an application.
It gives us the flexibility to execute all of its underlying services together as well as individu-

ally which makes troubleshooting and debugging easier. It helps to create an isolated testing
environment which can be conveniently created and destroyed. However, the capability of
Docker to execute applications in separate isolated containerized environments is not lim-
ited to a single host machine [11].
Docker Swarm is clustering and scheduling tool for Docker, which enables the containers to
collaborate while being distributed over multiple host machines over a network. Generally,
the swarm is a group of machines with functioning Docker environments running while be-
ing part of a cluster. Each physical or virtual machine as a member of the cluster is called
a node. All instructions for the swarm is executed on the cluster by Swam Manager. Swarm
managers coordinate the participation of a given machine to join the cluster as a worker
node. We discuss the below some of the distinguished features of the swarm mode [11].
• Docker swarm provides the built-in support for cluster management by creating mul-
tiple instances of the Docker Engine without needing any additional software tools.
• It enables us to assign roles to separate nodes in the runtime without needing any sep-
arate configuration during deployment. We can deploy nodes uniformly and designate
manager and worker nodes using Docker Engine. This contributes to faster deploy-
ment time.
• It lets us declare distinguished states of the services, e.g., we can designate a service as
front-end, back-end database or in any other declaratively during deployment. Fur-
thermore, we can allocate the desired number of tasks for each state of the service
which can be scaled up or down on demand.
• Built-in state monitoring keeps the overall cluster in a healthy state throughout the ap-
plication execution. For instance, if we mention the number of required containers in
the desired state, during the application execution, if a number of containers become
unavailable for some reason swarm manager is capable to dynamically spin the equal
number of containers to stabilize the state of the cluster. This contributes to the aspect
of high availability for the application.
• Docker swarm makes it possible to declare a top-level overlay network for the services
and while instantiating the containers Swarm Manager is capable to automatically as-
sign IP addresses to the containers in the cluster. Moreover Swarm Manager provides a

uniquely identifiable name for each service and automatic load balancing for underly-
ing containers. Therefore, each container under a service can be individually identified
and inspected.
• Each node in the Docker swarm cluster is secured by the TLS authentication and in-
tercommunication among the nodes in a cluster is always encrypted. Moreover, in the
swarm mode, Docker lets the administrators deploy service updates to nodes incre-
mentally with a faster recover option. These aspects contribute to security and easier
maintainability.
Networking plays the most crucial part of our prototype and an IoT solution. Docker al-
lows handling the networking in a platform-independent manner. Docker provides the cus-
tomization options for networks in the form of drivers.
• bridge network drives come per default in docker orchestrations. The bridge network
driver allows containers connected in the same network to communicate over agreed
upon ports and provides isolation from the host machine as well as other containers
which are not part of the same bridge network.
• host network driver removes the isolation for Docker containers and allows them to
use the network of the host machine. Docker swarm services use this category of net-
work drivers.
• overlay network driver allows the connectivity across separate Docker daemons. Us-
ing overlay network driver we can connect multiple Docker swarm services running on
separate Docker daemons or two stand-alone containers running on separate docker
daemons.
• macvlan network driver is unique network driver in Docker which allows assigning a
MAC address to a to Docker container. By doing so Docker container may behave as
a physical device inside Docker network and traffics can be routed to such containers
using their MAC addresses.
• none network driver is used to completely isolate a container from Docker networks.
Apart from these defined network drivers, Docker is flexible enough to allow for third-party
network plug-ins. For most common use cases in Docker environment, we usually default

to bridge network driver.
So far we created a compelling case for Docker to be a suitable fast hardware-independent
prototyping platform for our study. We intend to build a prototype of an IoT application
using WebRTC as the standard framework. In our prototype, we want to orchestrate a service
provider and a service consumer having a signaling server in the middle as intermediary
prescribed in the standard convention of WebRTC. In the upcoming approach section, we
delve into architecture and design approaches along with state of the art for our prototype.
Prior to that, we intend to focus on the individual component that we orchestrate to build
the prototype and their respective constraints.
3.1.2. Components of our Prototype and their Constraints
Figure 3.3.: Overview of publish subscribe pattern [12]
In this section, we focus on individual components in our prototype. We illustrate on their
individual roles and responsibilities as well as contextually discuss the languages and frame-
works we used to implement them. The publish-subscribe is a messaging pattern for
separate components to communicate in a distributed system. Three major components of
the publish-subscribe pattern are publishers which broadcast a message without knowledge
of subscribers, subscribers which listen for messages of interest without knowledge of the
publishers and the mediator or event service bus which mediates among the publisher and
subscribers.
Subscribers are capable to express their interest for a particular event or a category of events
to the event service which in turn notifies the subscribers in the occurrence of the respective
events published by the publishers. The event notifications are asynchronously communi-

cated to the interested subscribers as described in the figure 3.3 [12]. Messages are filtered
based on topic or content [58]. Some of the notable advantages of the publish-subscribe
pattern are
• Loose Coupling
Publisher and subscriber components don’t need to be concerned about the system
they are part of. They can operate independently. On the contrary in a tightly cou-
pled system like client-server paradigm client can only send a request when the server
is listening and server can only expect a request when the client is operational. Typi-
cally client needs to know the location of the server in the network making server sus-
ceptible to security risks. With publish-subscribe paradigm distributed systems could
achieve location transparency since publisher and subscriber do not need to know the
network address of each other. Moreover, publisher and subscriber do not need to be
operational at the same time in order for communication to take place.
• Scalability
Possibility to achieve loose coupling contribute for the publish-subscribe systems to
achieve scalability via strategies like concurrent operations, caching. Implementation
of the web syndication protocols such e.g. RSS enabled wide stretch of distributed
messaging allowing higher tolerance for latency and reduced guarantee for the mes-
sage for delivery [59].
• Security
Loose coupling also contributes to the aspect of security for the complete system.
Event service is the component which plays a major role in achieving this. The fact
that the publisher and the subscriber don’t need to know each other leads to the notion
that any malicious attack on one end doesn’t affect the other and sabotage the whole
system. Typically event service implements some sort of security mechanism in order
to abstract away the path of interaction between the publisher and the subscriber.
Because of these advantages publish-subscribe pattern received attention from the in-
dustry and researchers which in turn produced several novel prototypes and variations of
the traditional publish-subscribe pattern, confer Eugster et al. [12]. In our prototype, we
took inspiration from the traditional publish-subscribe pattern but not with its entirety, in

order to focus on our particular use case. In this study, we provide our definitions and con-
straints for publisher, subscriber, and mediator contextually.
Publisher
The publisher is a lightweight component and first building block in our IoT prototypical
application that performs a specific task of capturing and recording video. The publisher
offers this video track on subscription. Mediator provides the publisher with an offer from
a potential subscriber on which publisher responds with its answer along with the video
stream. Since we are building IoT application prototype, we categorize the publisher as a
sensor which transforms analog video feed to the digital data stream for transmission over
the network.
Constraints on publisher expect an underlying software framework having a small memory
footprint and capability to be executed on embedded devices equipped with a camera mod-
ule. It should also provide an implementation for WebRTC specification since we assumed
WebRTC as a standard framework to build our prototype. We introduce aiortc as a library
which provides the implementation for WebRTC specification and offers an API to use it in
our prototype [60].
aiortc is a Python library for WebRTC and Object Real-Time Communication (ORTC). The
library is built with the Python’s built-in asynchronous programming framework asyncio.
The main reason for choosing Python as the programming language for publisher service
is because Python is fast, easy-to-use and it works seamlessly in resources constrained em-
bedded devices, most often running on lightweight Linux distributions. Python scripts are
executable on Linux shell without needing a lot of prior configuration.
The most crucial reason for choosing aiortc is it implements WebRTC specification without
needing the invocation of the browser. On the contrary, most of the WebRTC implementa-
tions are inherently browser dependent and tightly coupled with the underlying media stack.
On the other hand aiortc uses PyAV [61]. PyAV is a Python wrapper for FFmpeg [62] which
is a cross-platform framework to work with audio and video data. FFmpeg offers a robust
API to manipulate almost any kind of media in all popular operating system platforms. PyAV
offers access to media in many forms e.g. containers, frames, streams etc. with reasonable

transformations creating a layer of abstraction over FFmpeg. In an embedded device running
on Linux, distribution PyAV can extract and record media from an attached camera module
or even prior-recorded video files. The most useful APIs that aiortc defines in accordance
with WebRTC specification are
• RTCPeerConnection represents a WebRTC connection object between a local and a
remote peer and it manipulates
- RTCSessionDescription type which describes the connection
- RTCConfiguration type which provides options to configure the peer connec-
tion object.
• RTCIceGatherer collects the connection information as per the Interactive Connec-
tivity Establishment specification. it manipulates
- RTCIceTransport which provides information about the ICE exchange state
and current role of a given peer.
- RTCIceParameters which is the dictionary which accommodates the parame-
ters of ICE.
- RTCIceServer which encapsulates the credentials and address to connect a
given STUN or TURN server.
• RTCDtlsTransport encapsulates the DTLS transport information and manipulates
- RTCCertificate which represents the certificate used by RTCDtlsTransport.
- RTCDtlsParameters which includes the configuration for the DTLS.
- RTCDtlsFingerprint which cryptographic function algorithms and fingerprints
for RTCCertificate.
• RTCRtpSender encapsulates the encoding and transport information for a media stream
object. Similarly RTCRtpReceiver deals with the decoding and receipt information
and RTCRtpTransceiver object glues the RTCRtpSender and RTCRtpReceiver. RTCRtp-
Parameters and RTCRtpCodecParameters encapsulate the RTP configuration and codec
settings for transport.

• RTCSctpTransport object encapsulates the capabilities of SCTP and manipulates RTC-
SctpCapabilities type which keeps the information about the SCTP capabilities in
a dictionary.
• RTCDataChannel encapsulates the capabilities regarding the transmission of arbitrary
data type and all the underlying configurations tracked by the RTCDataChannelPa-
rameters type.
• MediaStreamTrack is the object that represents a single track of the media stream and
it is parent type for AudioStreamTrack and VideoStreamTrack.
aiortc uses a list of video capture and compression strategies. It offers primarily two APIs
to capture and record video frames
• MediaPlayer API can be instantiated with the path to the web camera interface, a
URL to an online HTTP stream and even with the path to a video file. It captures video
feed and transforms it to MediaStreamTrack object for transmission over the network.
Behind the scene, it currently uses one of the following formatting strategies for video
avfoundation is the audio video media for Apple specific platforms such as ma-
cOS, iOS, watchOS, tvOS etc..
dshow abbreviated from Direct Show is the multimedia framework developed by
Microsoft mainly for Windows platforms.
v4l2 abbreviated from Video4Linux is a group of drivers and frameworks to cap-
ture video on Linux system.
vfwcap abbreviated from Video For Windows is a driver from FFmpeg to capture
videos in the Windows platform.
• MediaRecorder API can be instantiated with a path to the location where the recorded
video should be kept. As the name suggests it is primarily used for record MediaS-
treamTrack objects into a file. It currently uses one of the following formats in order
to record video
rgb24 is the video format where each of the red, green and blue components of a
single pixel consumes 1 byte of memory. Therefore each pixel sizes 3 bytes or 24 bits

explaining the name of the format.
yuv420p is one of the supported formats under YUV encoding system for images.
YUV system prioritizes the human perception for an image over the color accuracy.
This makes it possible to reduce some of the color details of the video frame under the
influence of low bandwidth.
We illustrate the use of the aiortc framework in our implementation in the approach sec-
tion.
Subscriber
We define subscriber as the second building block of our IoT prototypical application and
consumer of the media stream captured and offered by the publisher. The subscriber can
take a multitude of forms, e.g., web browser, smartphones or other hand-held smart de-
vices capable of recording and playing multimedia content. Typically, a subscriber is less
constrained in terms of processing capability compared to the publisher in our prototype.
Therefore, the only constraint that is applied on the subscriber is to natively support the
WebRTC specification stack in order to engage in signaling mechanism using the SDP pre-
scribed by WebRTC.
All popular web browsers these days natively support the WebRTC specification. For the sake
of simplicity, we assume our subscriber module to be a web page hosted by a web applica-
tion behind the scene. We, therefore, need a framework that is capable to spin up a web
application that hosts a web page and takes care of all interactions with the page as well as
the mediator component.
Node.JS is an event-driven platform based on non-blocking I/O to create an asynchronous
server-side JavaScript application [63]. Node.JS treats the event loop as a runtime utility.
In typical blocking operations, we start the event loop with a blocking method call, which
waits in an idle state until the result from the method call becomes available. Node.JS de-
pends heavily on callback functions and terminates the event loop when there remains no
more callback function remains to execute. Each callback is executed at the point when the
respective result becomes available. Due to the non-blocking nature, Node.JS can handle
multiple operations concurrently. By integrating HTTP method implementations, Node.JS is

the most well suited and efficient web development library in recent times. The category of
applications Node.JS is specially designed for is often referred to as Data Intensive Real-Time
applications [63]. It allows a server to keep several connections alive while serving concur-
rent requests.
Using Node.JS as a platform we often use multiple framework packages to perform specific
tasks in our application. Therefore, we need some sort of central repository to maintain the
collection of useful packages. npm is referred to as the largest registry for JavaScript packages
in the world and it works seamlessly with Node.JS. npm functions depend upon three distinct
components
• The website helps mainly to discover the packages and manage public and private
packages.
• The CLI helps in executing commands for possibly every npm operations, e.g., it ini-
tializes JavaScript projects, installs, removes, publishes and un-publishes packages
and many more.
• The registry is a public database for JavaScript frameworks. Upon searching for a
framework from the command line, npm looks up for a given framework in its registry.
In our implementation for an IoT prototypical application, we used several packages from
the npm registry in the subscriber module. We briefly mention their utilities here.
• express is a simple easy-to-use Node.JS web application framework for creating an
HTTP server and serve as the backbone of web pages and provide utility methods and
middleware solutions to create web APIs. Express provides a flexible parallel layer of
abstraction over standard Node.JS features and toolsets without blocking them [64].
• request is another lightweight Node.JS framework to work with standard HTTP method
calls. Although JavaScript provides several options to make HTTP calls natively the
philosophy behind the request module is to utilize the asynchronous non-blocking
I/O feature of Node.JS to provide the simplest possible API to make HTTP requests
using callbacks [65].
• socket.io offers a Node.JS server and a JavaScript client library in order to enable real-
time bidirectional event-based communication in web browsers and smart devices.

Socket.io is capable to establish a web socket communication through proxies or fire-
walls using a long-polling connection. It uses a heartbeat mechanism to automatically
detect the connection and disconnection and provides auto-reconnect feature out of
the box. We can emit any serializable object with customized events using socket.io
[66].
We illustrate the functional role of the subscriber component vividly in the approach section.
Mediator
The mediator is the third fundamental building block in our implementation and acts as an
intermediary between the publisher and the subscriber components. Mediator in our IoT
prototype attempts to adopt the role of an event service in the traditional publish-subscribe
messaging pattern according to its own constraints. Subscriber generates its own session
description called offer, representing a wish to consume the service from the publisher and
along with its preferred configurations for connectivity and media consumption and sends it
to the mediator. It is then the responsibility of the mediator to find the appropriate publisher
and forward the session description from the subscriber. , in turn, generates its acknowledg-
ment as a separate session description called answer, representing its own configuration for
connectivity to the media that it offers and communicates back to the mediator. Finally, the
mediator communicates the response from publisher back to the subscriber. At this point,
both the publisher and the subscriber have all information required to connect to each other
and exchange binary data. Therefore mediator makes it possible to have a P2P connection
without having publisher and subscriber to communicate with each other directly. It up-
holds the aspects of loose coupling, scalability, and security in our implementation.
The constraints on the mediator in our implementation are hosting a web application and
offering paths to the publisher and the subscriber to reach them with their session descrip-
tions. Moreover, the mediator needs to serve several subscribers at the same time. It needs
the same asynchronous, non-blocking event-driven approach we take for the subscriber.
Node.JS is, therefore, an appropriate framework to implement the mediator.

3.2. Approaches for Building our Solution 73
3.2. Approaches for Building our Solution
We illustrated the fundamental building blocks for our implementation along with their con-
straints and the framework we chose to implement them. In this section, we discuss the ap-
proaches we took so far to build our prototype. We discuss primarily two approaches from
the logical point of view. We need to mention at this point that in a traditional WebRTC peer
connection each the publisher and subscriber components can initiate a connection and ex-
pect other peer to respond with an answer. In more sophisticated implementations it is pos-
sible for both peers to join a room, which is the object that maintains the connections and
ICE exchange state for both peers. This methodology is ideal for the use cases where both
peers try to engage in a video call and add the media track to the peer connection. However,
in our case, the publisher module is an embedded device with a camera module which only
captures and transmits the video and the subscriber module is a lightweight client which
only consumes the video. Therefore, we got the subscriber to trigger the addTransceiver
method on RTCPeerConnection in order to create a RTCRtpTransceiver object and add
to the list of transceivers of the peer connection. RTCRtpTransceiver object can have one
of the following directions which impact the behavior of RTP [67]
• sendrecv represents the intention to both send and receive the RTP data if the offer is
accepted by the other peer and responded with an answer.
• sendonly represents the intention to only send the RTP data if the offer is accepted by
the other peer and responded with an answer.
• recvonly represents the intention to only receive RTP data if the offer is accepted by
the other peer and responded with an answer.
• inactive represents neither to send nor to receive the RTP data from the remote peer.
In our implementation subscriber only wants to receive the media stream. In order to start
with the signaling procedure it initiates the RTCPeerConnection object and sets the direc-
tion of its RTCRtpTransceiver object to recvonly. In this section, we describe two of our
approaches which we took thereafter in order to establish a functional peer connection and
one-way transmission of the media stream. We started with the most basic approach as de-
scribed in the figure 3.4.

Figure 3.4.: Approach 1 for implementation of prototypical application
1. Subscriber creates its offer with the asynchronous createOffer method on the RTCPeer-
Connection. The offer is an RTCSessionDescription object which encapsulates the
information about optional media track, all possible media codec information sup-
ported by the browser and any possible ICE candidate for establishing the peer con-
nection.
2. Subscriber configures the properties for the local end of the peer connection object
by calling method setLocalDescription. This method is triggered asynchronously
and takes effect when negotiation is completed by both peers. Calling this method
generates an RTCIceCandidate object and triggers the icecandidate event.
3. Subscriber uses an HTTP POST method in order to reach the publisher directly on an
agreed path with the offer.
4. Upon receiving the offer publisher triggers the setRemoteDescription method in or-
der to set the properties of the remote end of the connection which is the subscriber
from the perspective of the publisher. This method is triggered asynchronously and
only takes effect when negotiation is complete by both peers.
5. Publisher instantiates the MediaPlayer API from aiortc framework with the video in-
terface and adds an instance of the VideoStreamTrack object to the peer connection

object of its end.
6. Publisher creates an answer to the offer it received from the subscriber with the help of
createAnswer method of the peer connection object. The answer object is yet another
session description encapsulating the information about the media stream that was
added to the peer connection, codec information it supports and any possible ICE
candidate the peer connection has gathered so far.
7. Publisher uses the setLocalDescription method in order to add the properties for
its own end of the peer connection. This method is triggered asynchronously and only
take effect when negotiation is completed by both peers.
8. Publisher sends back its answer object as a response to the HTTP POST method trig-
gered by the subscriber.
9. Upon receiving the answer as a response from the publisher, subscriber sets the prop-
erties of the remote end of its peer connection object with the answer object and the
setRemoteDescription method.
10. At this point, all asynchronous method invocations are fulfilled by both parties. As a re-
sult, both the publisher and subscriber know their own configuration for the peer con-
nection and the configuration for their relative remote peers. The iceGatheringState
property for peer connections of both parties is set to be complete. Subscriber now can
consume the captured media stream which is transmitted by the publisher.
Upon building our first functioning prototype we need to have a retrospective of this ap-
proach. We observe that in this implementation we use two of our primary building blocks
and have them communicate with each other directly. While this functions there are certain
drawbacks from the following perspectives
• Publisher and subscriber components are tightly coupled. They both must know how
to address one another in the network. Any change in the path exposed by the pub-
lisher will also lead to a subsequent change in the subscriber end. This means in a
real-world IoT solution if the publisher component and subscriber component is built
by separate stakeholders any change involves coordination between both parties.
• Tight coupling between publisher and subscriber sabotage the aspect of security for

the whole system. Effect of any malicious attack on one end may propagate to the
other end easily.
Furthermore, in this approach the subscriber module is implemented with Python as a web
application to host a static web page that defines a single JavaScript file which implements
a WebRTC specification stack and makes HTTP connections with the publisher to exchange
the session description. In runtime, JavaScript code is downloaded to the browser and ex-
ecutes in the context of the host machine. This solution is not robust enough to scale with
more complex implementations and does not uphold the separation of concerns aspect for
object-oriented design principles.
In light of these drawbacks, we introduce our third building block mediator in the orches-
tration. In accordance with the standard WebRTC specification we name this component
as Signaling service in our implementation. We also revise our subscriber component with
Node.JS and Socket.io. Figure 3.5 illustrates the revised subscriber component. Native
JavaScript code implements the WebRTC stack while offloading the responsibility of HTTP
network interactions to the web application.
Figure 3.5.: Socket.io implementation for interaction between static page and subscriber
web application
1. Native JavaScript implements the WebRTC stack. It then generates an offer from sub-
scriber end and emits a dedicated socket event to notify the web application about the
offer generation.
2. Upon getting notified with the offer generation event and the offer object web appli-

cation communicates to the Signaling server and waits for the response.
3. Signaling server performs its designated processes and finally sends back a response to
the subscriber web application with the answer object from the publisher component.
4. The subscriber web application receives the answer object from the publisher and
emits a socket event in order to communicate the object to the native code running
behind the static web page. This completes the signaling procedure.
Now that we dissected the revised subscriber component we can focus on the big picture of
the revised orchestration among our three fundamental building blocks. The figure 3.6 below
demonstrates the revised approach after we incorporate Signaling component to mediate
and act as a bridge between the publisher and subscriber.
Figure 3.6.: Approach 2 for implementation of prototypical application
1. The Subscriber creates its offer with the asynchronous createOffer method on the
RTCPeerConnection. The offer is an RTCSessionDescription object which encap-
sulates the information about optional media track, all possible media codec informa-
tion supported by the browser and any possible ICE candidate for establishing the peer
connection.
2. The Subscriber configures the properties for the local end of the peer connection ob-

ject by calling the method setLocalDescription. This method is triggered asyn-
chronously and takes effect when negotiation is completed by both peers. Calling this
method generates an RTCIceCandidate object and triggers the icecandidate event.
3. The Subscriber communicates its offer with the HTTP POST method to the Signaling
server, which exposes a dedicated path for the subscriber to reach out.
4. The Signaling server communicates the offer from the subscriber to the publisher via
a dedicated path exposed by the publisher’s web application for the Signaling server to
reach out.
5. Upon receiving the offer the publisher triggers the setRemoteDescription method in
order to set the properties of the remote end of the connection which is the subscriber
from the perspective of the publisher. This method is triggered asynchronously and
only takes effect when negotiation is complete by both peers.
6. The Publisher instantiates the MediaPlayer API from the aiortc framework with the
video interface and adds an instance of the VideoStreamTrack object to the peer con-
nection object of its end.
7. The Publisher creates an answer to the offer it received from the subscriber with the
help of the createAnswer method of the peer connection object. The answer object is
yet another session description encapsulating the information about the media stream
that was added to the peer connection, like codec information it supports and any
possible ICE candidate the peer connection has gathered so far.
8. The Publisher uses the setLocalDescription method in order to add the properties
for its own end of the peer connection. This method is triggered asynchronously and
only takes effect when negotiation is completed by both peers.
9. The Publisher sends its answer back to the Signaling server as an HTTP response.
10. Upon receiving an answer from the Publisher, the Signaling server sends the answer
back to the subscriber as an HTTP response.
11. Upon receiving the answer of the Publisher from the Signaling server the Subscriber
sets the properties of the remote end of its peer connection object with the answer
object and the setRemoteDescription method.

12. At this point, all the asynchronous method invocations by both parties are fulfilled.
As a result, both the publisher and subscriber know their own configuration for the
peer connection and the configuration for their relative remote peers. The iceGath-
eringState property for peer connections of both parties is set to be complete. Sub-
scriber now can consume the media stream captured and transmitted by the publisher
directly.
Subscriber Signaling Publisher
createOffer()
setLocalDescription(localOffer)
POST(subscriberOffer)
POST(subscriberOffer)
setRemoteDescription(remoteOffer)
addTrack()
createAnswer()
setLocalDescription(localAnswer)
RESP(publisherAnswer)
RESP(publisherAnswer)
setRemoteDescription(remoteAnswer)
RTCPeerConnection
RTCPeerConnection
Figure 3.7.: Sequence diagram for implementation
Figure 3.7 presents a sequence diagram for this approach of our prototype implementation.

3.3. Environment and Orchestration for Our Solution
We illustrated the fundamental building blocks of our prototype along with their individual
constraints as well as our approach so far to build the prototype. We introduced container-
based virtualization as an enabler of fast prototyping as well as a contributor to the aspects
of isolation, availability, and security. In this section, we illustrate the environment for our
prototypical application and the orchestration of our building blocks in the environment.
In order to bring a viable solution for the orchestration of the publisher, the subscriber and
the signaling components we need to reconsider the networking options that Docker offers
per default. In order to illustrate this, we consider the scenario of our subscriber compo-
nent. The subscriber component itself is consisting of a web application sub-component,
which handles the network operations across containers and a native JavaScript code, which
implements the WebRTC specification stack. Since the WebRTC stack is natively available
in web browsers this piece of code needs to be executed in the web browser itself. There-
fore, in runtime, web browsers download the HTML along with the native JavaScript code
and execute them in the context of a host machine. As per our design, socket events emitted
by this native JavaScript code must reach to the web application, which is running using a
Node.JS environment inside the container being isolated from the host machine with the
help of namespace. We, therefore, need a more reliable and robust solution for routing our
traffic across containers correctly in the orchestration since leveraging default Docker bridge
network seems to be inconvenient.
Security platforms operate as secure bridges between the private and public networks. It
is often used to access web applications hosted in a secure private network from the web
browsers. A reverse proxy is one of the mechanisms of security platforms, which manipu-
lates network requests to be directed towards itself and then takes up the responsibility to
redirect the requests to appropriate servers hosted in private or public networks based on
some agreed parameters [68]. Corresponding responses are in turn first received by the re-
verse proxy before being delivered back to the browser. We used a reverse proxy mechanism
in our prototypical application in order to remove the dependency on default Docker bridge
networks.
Traefik [69] is a modern easy-to-use HTTP reverse proxy and load balancer solution. We

3.3. Environment and Orchestration for Our Solution 81
integrated traefik to be executed as a standalone service in our Docker orchestration as
shown in the following code listing 3.1. In this orchestration, we named the subscriber com-
ponent as client. Since traefik handles the routing among the containers in context of the
host machine it needed to execute itself in privileged mode inside Docker environment.
Listing 3.1: Orchestration of initial approach with all services in a single host
version : ’3’
services :
traefik :
privileged : true
command:
−−api −−docker −−docker .domain="docker.localhost"
−−docker . endpoint=unix :/// var/run/docker . sock
−−docker . watch=true −−docker . exposedbydefault="false"
container_name : traefik
image : traefik
volumes :
− /var/run/docker . sock :/ var/run/docker . sock
ports :
− 9999:80
− 8080:8080
signaling :
container_name : signaling
build : ./ signaling
expose :
− 9999
labels :
− "traefik.frontend.rule=Host:docker.localhost;PathPrefix:/signaling/
subscribe;"
− "traefik.frontend.entryPoints=http"
− "traefik.port=9999"
− "traefik.enable=true"
publisher :
container_name : publisher
build : ./ publisher

depends_on :
− signaling
devices :
− /dev/video0 :/ dev/video0
expose :
− 9999
labels :
− "traefik.frontend.rule=Host:docker.localhost;PathPrefix:/
receive_client_offer;"
command: ["./wait-for-it.sh" ,"signaling:9999" ,"--" ,"python" ,"-u" ,"/usr/src/
publisher/publisher.py" ,"--port" ,"9999" ,"-v"]
client :
container_name : client
build : ./ client
depends_on :
− signaling
− publisher
expose :
− 9999
labels :
− "traefik.frontend.rule=Host:docker.localhost;PathPrefix:/client,/socket.io
,/receive_publisher_answer;"
command: ["./wait-for-it.sh" ,"signaling:9999" ,"--" ,"npm" ,"start"]
We chose the same port number for simplicity, which is 9999 in our example as shown in the
code listing 3.1, to be shared by all of our containers internally. We configured the traefik
base image with this port number. This made all the containers to be reachable by traefik
during execution. We used the following parameters to configure and enable the routing
aided by traefik reverse proxy.

• traefik.frontend.rule was used to define agreed distinct path identifiers for each
of the services and configure them as rules for traefik for each of the containers.
Each of these paths, when found in a given URI as the path parameter, worked as a ref-
erence for the traefik to identify and route the request to an appropriate container-
ized service in the Docker orchestration. For instance in the code listing 3.1 above, for
the signaling service we configured the path /signaling/subscribe. When this path was
encountered after a valid domain name, traefik knew that incoming network traffic
should be forwarded towards the container, executing the signaling service.
• traefik.frontend.entryPoints was used to notify traefik to only allow the net-
work requests, which use http protocol scheme.
• traefik.port was used to set the intended port number to be allowed by traefik.
• traefik.enable was used to designate whether the traefik reverse proxy routing
should be enabled for the particular service.
• docker.domain parameter was used in the command specifically for the traefik ser-
vice in order to designate the intended domain name as docker.localhost for access-
ing the orchestrated services. For instance, the URL http://docker.localhost:9999/client
helped traefik to determine that this specific network request is intended for the ser-
vice running inside the container named client. The path /client was configured in the
traefik.frontend.rule in the client service in our orchestration.
Under this setting we needed to access the intended resource from the web browser using
the domain name designated by traefik reverse proxy which is docker.localhost and provide
the correct path name and port number in accordance with the traefik rule configured as
part of the recipe for Docker compose orchestration. Moreover, we made appropriate mod-
ifications to the source code to enable internal HTTP requests to be routed correctly over
reverse proxy. Under these settings, we achieved our first draft of a functioning implementa-
tion executing inside containerized virtual environment in a single Linux host.
This was the appropriate checkpoint to further refactor the implementation and achieve bet-
ter loose coupling and security among our services. We wanted to build an IoT solution pro-
totype where the layer having sensors is most often located in a considerable remote distance
from application layer or intermediate cloud data processing layer. We needed to isolate the

publisher, the signaling and the subscriber components from being executed as a part of the
same orchestration and distribute them possibly into separate hosts. For the time being, we
separated publisher component from signaling and subscriber component into a Raspberry
Pi device equipped with a web camera module as shown in the figure 3.8.
Figure 3.8.: Isolated publisher module configured in raspberry pi
We configured a standalone Docker container virtualization environment on the Raspberry
Pi in order to execute publisher service in a container which encapsulates all its required
dependencies. Furthermore, we accumulated the dependencies for the publisher module
into a Docker image specifically built to be executable in Raspberry Pi which supports ARM
processor architecture as shown in the code listing 3.2.
Listing 3.2: Dockerfile for publisher having Docker image executable in ARM architecture
FROM sarkar1986chandan/aiortc_arm_python :1.0
COPY ./ publisher . py /usr/ src /publisher . py
WORKDIR /usr/ src
RUN chmod +x /usr/ src /publisher . py
We kept the Raspberry Pi and the consumer host at the same network to ease up our testing.
While testing our implementation we encountered a challenge with the aiortc framework
for accessing the web camera module directly and transmit the video stream, specifically
when signaling service is mediating between publisher and subscriber service. We analyzed
the behavior using a direct feed from the web camera as well as using prerecorded video
feed, which we used for debugging purposes. We observed that video stream is transmitted
and consumed typically while using prerecorded video feed being read from a file. However,

it suffers significant frame drop while accessing the direct web camera module.
In order to improve the quality of the video stream, we refactored the publisher service and
created an underlying recorder service. We chose to try two different frameworks to use
the camera module and record the video, which are OpenCV and ffmpeg. Using OpenCV we
encountered a very slow pace on recording and ended up in a situation, where the video
stream transmission by aiortc outpaced the recording, which caused quick disruption in
stream transmission. At this point, we switched to ffmpeg for video capture and achieved a
better yet inconsistent outcome. Following code listing 3.3 taken from the docker-compose
recipe in the Raspberry Pi device describes our container orchestration on the publisher side.
Listing 3.3: Orchestration of revised publisher running in Raspberry Pi
version : ’3’
services :
recorder :
container_name : recorder
build : ./ recorder
devices :
− /dev/video0 :/ dev/video0
volumes :
− record−volume:/ record
publisher :
depends_on :
− recorder
container_name : publisher
build : ./ publisher
volumes :
− record−volume:/ record
ports :
− 8888:8888
command: ["python" ,"/usr/src/publisher.py" ,"--port" ,"8888" ,"-v"]
volumes :
record−volume :
We placed the recording file in an agreed volume shared by both containers, which are run-
ning recorder and publisher service respectively. We let the aiortc framework running in

the publisher service to read the video file and transmit the video.
On the consumer side we continued to use traefik reverse proxy framework to route the
network traffic between the signaling and client components. We used the same parameters
to configure the traefik framework, which we discussed before in our first prototype. The
code in the signaling component is modified to forward the network traffic towards the IP
address assigned to the Raspberry Pi by the router. Moreover, this time we distinguished be-
tween the port number used by the publisher service and the traefik reverse proxy service
to avoid the conflict because both services are operational in the same network. Following
code listing 3.4 describes the container orchestration on the consumer side.
Listing 3.4: Orchestration of revised consumer and signaling
version : ’3’
services :
traefik :
privileged : true
command:
−−api −−docker −−docker .domain="docker.localhost"
−−docker . endpoint=unix :/// var/run/docker . sock
−−docker . watch=true −−docker . exposedbydefault="false"
container_name : traefik
image : traefik
volumes :
− /var/run/docker . sock :/ var/run/docker . sock
ports :
− 9999:80
− 8080:8080
signaling :
container_name : signaling
build : ./ signaling
expose :
− 9999
labels :
− "traefik.frontend.rule=Host:docker.localhost;PathPrefix:/signaling/
subscribe;"

client :
container_name : client
build : ./ client
depends_on :
− signaling
expose :
− 9999
labels :
−"traefik.frontend.rule=Host:docker.localhost;PathPrefix:/client,/socket.io
,/receive_publisher_answer;"
command: ["./wait-for-it.sh" ,"signaling:9999" ,"--" ,"npm" ,"start"]
In this experiment, we received a functional but inconsistent video stream as shown in the
figure 3.9. The recording of the video feed from the webcam was no longer outpaced by
the video transmission but we observed a major performance bottleneck in terms of mem-
ory consumption in the Raspberry Pi while recording the video. This resulted in a non-
deterministic disruption in video transmission. We illustrate this behavior further in our
evaluation.
Figure 3.9.: Subscriber module consuming the video

Under given circumstances and observations we now proceed to evaluate our implementa-
tion. We are successful in creating P2P
, component-based, hardware-independent IoT im-
plementation prototype using WebRTC. On the other hand, we encountered major compro-
mise and challenges in quality of service, resulted in an inconsistent video stream.

89
4 Evaluation
We discussed our approach to build a prototypical application based on the propositions
we made in order to consider WebRTC as a viable standardization framework for IoT. In this
chapter, we evaluate our application in light of the previously specified design goals for the
framework, which is suitable for standardization of IoT.
• Capability to handle P2P communication
The proposed solution is capable to establish a P2P communication session between
concerned parties. The session is kept alive while both parties are engaged in active
communication. The implementation of publish-subscribe design pattern enables us
to create a scalable and distributed architecture.
• Capability to enable fast hardware-independent prototyping
Our prototype features an orchestration in Container virtualization platform with Docker.
Containers allow us to run the components of the application in an isolated container-
ized environment with all required dependencies. It is possible to choose an appropri-
ate base image as a template to create containers which are independent of the hard-
ware or software platforms on which orchestration of containers itself is taking place.
This makes it possible to choose any host machine which supports Container technol-
ogy to host the application with minimal additional configuration.
• Capability to support for web based applications
Our application prototype is inherently web-based. Subscriber module initiates the
connection establishment procedure by generating its SDP object and subscribes to
the signaling server over the network with the help of an HTTP POST method. Signal-
ing server, in turn, transfers the SDP from subscriber to the publisher over the network

90 4. Evaluation
via HTTP POST methods. Similarly, the SDP object generated by the publisher as a
response is communicated back to the subscriber with HTTP response messages.
• Capability to feature component based architecture
We tried to modularize the architecture of the prototype from the ground up. We
adapted a publish-subscribe pattern for distributed system design and created thee
distinct building blocks for our implementation having specific responsibilities. Each
of these three components is executed in separate containers which encapsulate re-
spective dependencies.
Now that we evaluated our prototypical application against the predefined design goals we
want to draw a direct comparison of our solutions against similar works, which we reviewed
earlier. Our solution attempts to address the issues we highlighted before. our prototypi-
cal application is not dependent on a web browser in order to capture the camera feed and
transmit as a video stream. The aiortc framework provides a native implementation of the
WebRTC specification stack which enables it to operate independently with the intervention
of the web browser.
Our solution features hardware-independent fast prototyping with the help of Linux container-
based virtualization platform Docker. This not only enables our application to be executable
in any Linux system having a Docker orchestration environment but also we can quickly
scale our implementation with the addition or modification of new or existing services.
Our solution is not focused on particular use cases. We adapted the publish-subscribe design
pattern of the distributed system in order to bring the values such as scalability, loose cou-
pling, and security. Therefore, our prototype has the potential to be considered as a template
in order to build complex web-based, component-oriented, loosely coupled and secure P2P
distributed applications. With these values, we also encountered some critical challenges,
which require elaboration.
WebRTC specification per default does not provide any utility for a local peer in order to
create a connection to a remote peer. It delegates the responsibility to create a channel of
communication between the peers, to the implementer of WebRTC. This channel of com-
munication is encapsulated by the concept called the signaling in WebRTC as we discussed
before. This flexibility makes it possible for the implementers to go for their chosen option

91
for implementing the signaling utility. The aiortc framework, unfortunately, encapsulates
its own implementation of the signaling server. It does not offer the interface to plug in an
alternate implementation for the signaling. This design decision is rigid and does not follow
WebRTC specification.
We tested the application using prerecorded video feed as well as using a camera module.
We observed that while using a prerecorded video feed the framework is capable to transmit
an acceptable quality of video stream. However, when a real web camera is involved the
resulting video stream is tremendously compromised in terms of performance. The stream
is slow and not usable for practical purposes. In order to counter this issue, we fell back to
implement a real-time recording service instead of transmitting the feed from the camera
directly. Following are the frameworks that we tried to work with in order to implement the
recording service.
• OpenCV is a Python framework that we tried to use in order to record the video. Firstly
we encountered tremendous challenges in building the OpenCV framework for the ARM
architecture in Linux. This was required in order to execute the OpenCV inside a Docker
container in the Raspberry Pi. Secondly, we discovered that OpenCV only functions in
Linux platform with Python version 2. Rest of the Python frameworks we used in our
implementation had the dependency on Python version 3. This conflict in Python ver-
sions made us implement the recording feature entirely in a separate container.
Upon executing the services, we discovered that the recording the video feed from the
camera in Raspberry Pi using OpenCV is quite a slow process. Therefore, it was out-
paced by the concurrently running video transmission process which resulted in a
quick breakage in video transmission. In order to mitigate this issue, we needed to
explore for an alternative framework for video recording.
• ffmpeg is a multimedia framework suite that can be used from the command line in
Linux environment and it is easier to use compared to OpenCV. We tried to use ffm-
peg as an alternative of OpenCV in order to record video. We still implemented record-
ing service in a separate container in order to concurrently execute the recording and
transmission of the video feed.
This new arrangement produced a better outcome compared to the previous approach
in terms of the quality of the stream and ease of use. However, we encountered a heavy

92 4. Evaluation
memory consumption issue in the Raspberry Pi device, while the recording is going on.
Since Raspberry Pi is a single board computer, running on low memory, high memory
consumption by ffmpeg caused the device to go in an unresponsive state a little while
after starting of the video transmission. This caused a disruption in the video stream
of non-deterministic nature.
Thus, neither of our approaches to recording the camera feed was proved to be a viable one.
At this point, we appreciate our observations regarding the behavior of the chosen frame-
work aiortc, particularly in presence of the camera module. We could not transmit the di-
rect feed from the camera without a significant compromise in quality of the video playback
in consumer end. Moreover, while trying to use recording utility we encountered significant
memory crunch in the Raspberry Pi, rendering it in an unresponsive state. This resulted in
a functioning video stream which is susceptible to disruption after arbitrary time from the
start. From these observations, we suspect that the framework uses an inappropriate en-
coding strategy for the video feed while using the camera module. For the time being, this
behavior remains an open issue with the framework in our implementation.
With our current implementation, the signaling module needs to know the reachability in-
formation of the publisher module so that the subscription offer from the subscriber could
be transferred to the publisher. This approach is not ideal for achieving complete loose cou-
pling and may create a security loophole to be exploited by a malicious attacker.
The signaling server and subscriber module at present run inside the same docker orches-
tration in the same host machine. This is also a notable drawback in terms of loose-coupling,
security and does not reflect on a realistic IoT application setup accurately. We need to be
able to deploy the signaling service in a separate host, possibly in the public network as a
service in a practical scenario.
We discussed the newly proposed QUIC specification for the WebRTC but we could not use
the same in our implementation. In our study we found some code snippets from Google’s
Chromium project repository [70] written in C++ language without any useful reference for
usage. Furthermore, from our study, we got the notion that this API is experimental and
inherently dependent on Google’s Chromium web browser.

93
5 Conclusion
In this study, we explored the conceptual foundation of IoT and attempted to understand
its impact on modern networking demographic. We introduced the traditional and mod-
ern specifications of WebRTC while proposing it as a suitable standardization framework to
build IoT solutions. We reviewed some of the recent contextual works on the subject matter,
which feature the IoT focused solution based on WebRTC. Each of these applications was
either focused on their specific use cases or inherently dependent on web browsers, which
support the WebRTC specification stack. We analyzed the aforementioned limitations and
set our objective to evaluate the potential of WebRTC as a standard framework to build IoT
applications. We attempted to get rid of the dependency of WebRTC on the web browsers
since it is not optimal for IoT sensors to invoke browsers. We also made efforts to adapt one
of the distributed system design patterns in order to develop a scalable and maintainable
prototypical application. Finally, we evaluated our application with respect to the design
goals, which we set earlier for a potential standardization framework for IoT.
On the positive side, we created a component-based,scalable and loosely coupled distributed
IoT application. It is a web-based orchestration of three distinct functional modules hav-
ing their dependencies self-contained inside respective containers. Therefore, they are ex-
ecutable in any environment supporting container virtualization, which contributes to the
aspect of platform-independent fast prototyping. Publisher and subscriber module estab-
lishes a P2P connection with the help of signaling server module as an intermediary decou-
pling agent in order to realize the WebRTC specification. Publisher no longer necessitates
having a web browser which implements the WebRTC specification stack. Our solution ad-
dresses the issues we highlighted while defining our goal for this study. Thus we were able to
create a strong argument in favor of WebRTC to be considered as a standardization frame-
work for IoT.

94 5. Conclusion
On the other hand, we encountered substantial challenges in the implementation phase,
compromising the quality of service guarantee. We encountered severe downfall in the per-
formance of the video stream while using the camera module, often resulting in a video
stream which is non-productive for real-world use cases. Our alternative approach of us-
ing a recorder utility to mitigate the issue of using direct camera feed improved the quality
of video stream but on the other hand, resulted in a severe performance bottleneck in Rasp-
berry Pi and subsequent service disruption as a result. This can be troubleshot in future with
a deeper analysis of the associated frameworks, a better signaling strategy and appropriate
video encoding options being used in the implementation.
The current implementation features a signaling server running inside the same orchestra-
tion environment with a subscriber module. Furthermore, the signaling server needs to
know the network address of the publisher. These could result in security loopholes for the
entire application. In order to address these concerns, one could isolate the signaling server
component in a separate host. Moreover, one could revise the implementation in such a way
that the publisher needs to register with the signaling server with its own reachability infor-
mation before initiation of the P2P connection. This refines to the aspect of loose coupling
and promotes security for the overall system.
With the further development and stable release of the QUIC specification of WebRTC in near
future, it should be possible to develop IoT solutions which are independent of web browsers
using QUIC protocol stack. QUIC, as we discussed in this study has some key promises to
improve on the traditional network communication. Using QUIC could significantly boost
the performance and quality of video streams in media transmission.
In light of our evaluation of WebRTC as a standardization framework for platform indepen-
dent fast prototyping of IoT solutions, we conclude this study with the affirmative outcome
having a note of the critical challenges which are encountered and possible ways to mitigate
them.

95
Bibliography
[1] M. Antunes, C. Silva, and J. Barranca, “A telemedicine application using webrtc,” Proce-
dia Computer Science, vol. 100, pp. 414–420, 2016.
[2] Y. Sulema and G. Rozinaj, “Webrtc-based 3d videoconferencing system,” in ELMAR,
2017 International Symposium. IEEE, 2017, pp. 193–196.
[3] Affordable live streaming camera. [Online]. Available: https://www.bunver.com/
building-an-affordable-live-streaming-camera-using-a-raspberry-pi/
[4] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, and M. Ayyash, “Internet of
things: A survey on enabling technologies, protocols, and applications,” IEEE Commu-
nications Surveys & Tutorials, vol. 17, no. 4, pp. 2347–2376, 2015.
[5] I. S. Udoh and G. Kotonya, “Developing iot applications: challenges and frameworks,”
IET Cyber-Physical Systems: Theory & Applications, 2017.
[6] S. Loreto and S. P. Romano, Real-Time Communication with WebRTC: Peer-to-Peer in
the Browser. " O’Reilly Media, Inc.", 2014.
[7] Webrtc official. [Online]. Available: https://webrtc.org/
[8] Quic a new internet transport. [Online]. Available: https://www.ietf.org/proceedings/
96/slides/slides-96-quic-5.pdf
[9] J. Janak and H. Schulzrinne, “Framework for rapid prototyping of distributed iot appli-
cations powered by webrtc,” in Principles, Systems and Applications of IP Telecommu-
nications (IPTComm), 2016. IEEE, 2016, pp. 1–7.
[10] M. G. Xavier, M. V. Neves, F. D. Rossi, T. C. Ferreto, T. Lange, and C. A. De Rose, “Perfor-
mance evaluation of container-based virtualization for high performance computing
environments,” in Parallel, Distributed and Network-Based Processing (PDP), 2013 21st
Euromicro International Conference on. IEEE, 2013, pp. 233–240.
[11] Docker official documentation. [Online]. Available: https://docs.docker.com/engine/
docker-overview/
[12] P
. T. Eugster, P. A. Felber, R. Guerraoui, and A.-M. Kermarrec, “The many faces of pub-
lish/subscribe,” ACM computing surveys (CSUR), vol. 35, no. 2, pp. 114–131, 2003.
[13] Iot devices will outnumber worlds population for the first time. [Online].
Available: https://www.zdnet.com/article/iot-devices-will-outnumber-the-worlds-
population-this-year-for-the-first-time/

96
[14] Video meets iot. [Online]. Available: https://www.mckinsey.com/industries/high-
tech/our-insights/video-meets-the-internet-of-things
[15] Kurento media server and framework. [Online]. Available: http://www.kurento.org
[16] L. Atzori, A. Iera, and G. Morabito, “The internet of things: A survey,” Computer net-
works, vol. 54, no. 15, pp. 2787–2805, 2010.
[17] D. Evans, “The internet of things: How the next evolution of the internet is changing
everything,” CISCO white paper, vol. 1, no. 2011, pp. 1–11, 2011.
[18] J. Gantz and D. Reinsel, “The digital universe in 2020: Big data, bigger digital shadows,
and biggest growth in the far east,” IDC iView: IDC Analyze the future, vol. 2007, no.
2012, pp. 1–16, 2012.
[19] S. Taylor, “The next generation of the internet revolutionizing the way we work, live,
play, and learn,” CISCO, San Francisco, CA, USA, CISCO Point of View, vol. 12, 2013.
[20] Internet of things architecture project iot-a. [Online]. Available: https://cordis.europa.
eu/project/rcn/95713_en.html
[21] R. Khan, S. U. Khan, R. Zaheer, and S. Khan, “Future internet: the internet of things ar-
chitecture, possible applications and key challenges,” in Frontiers of Information Tech-
nology (FIT), 2012 10th International Conference on. IEEE, 2012, pp. 257–260.
[22] Z. Yang, Y. Yue, Y. Yang, Y. Peng, X. Wang, and W. Liu, “Study and application on the
architecture and key technologies for iot,” in Multimedia Technology (ICMT), 2011 In-
ternational Conference on. IEEE, 2011, pp. 747–751.
[23] M. Wu, T.-J. Lu, F.-Y. Ling, J. Sun, and H.-Y. Du, “Research on the architecture of internet
of things,” in Advanced Computer Theory and Engineering (ICACTE), 2010 3rd Interna-
tional Conference on, vol. 5. IEEE, 2010, pp. V5–484.
[24] M. A. Chaqfeh and N. Mohamed, “Challenges in middleware solutions for the internet
of things,” in Collaboration Technologies and Systems (CTS), 2012 International Confer-
ence on. IEEE, 2012, pp. 21–26.
[25] M. R. Abdmeziem, D. Tandjaoui, and I. Romdhani, “Architecting the internet of things:
state of the art,” in Robots and Sensor Clouds. Springer, 2016, pp. 55–75.
[26] J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami, “Internet of things (iot): A vi-
sion, architectural elements, and future directions,” Future generation computer sys-
tems, vol. 29, no. 7, pp. 1645–1660, 2013.
[27] E. Ferro and F. Potorti, “Bluetooth and wi-fi wireless protocols: a survey and a compari-
son,” IEEE Wireless Communications, vol. 12, no. 1, pp. 12–26, 2005.
[28] S. Sesia, M. Baker, and I. Toufik, LTE-the UMTS long term evolution: from theory to prac-
tice. John Wiley & Sons, 2011.

97
[29] A. Ghosh, R. Ratasuk, B. Mondal, N. Mangalvedhe, and T. Thomas, “Lte-advanced: next-
generation wireless broadband technology,” IEEE wireless communications, vol. 17,
no. 3, 2010.
[30] Open automotive alliance. [Online]. Available: https://www.openautoalliance.net/
#about
[31] D. J. Cook, A. S. Crandall, B. L. Thomas, and N. C. Krishnan, “Casas: A smart home in a
box,” Computer, vol. 46, no. 7, pp. 62–69, 2013.
[32] L. Yongfu, S. Dihua, L. Weining, and Z. Xuebo, “A service-oriented architecture for the
transportation cyber-physical systems,” in Control Conference (CCC), 2012 31st Chinese.
IEEE, 2012, pp. 7674–7678.
[33] E. Miller, “An introduction to the resource description framework,” Bulletin of the Amer-
ican Society for Information Science and Technology, vol. 25, no. 1, pp. 15–19, 1998.
[34] P
. Patel and D. Cassou, “Enabling high-level application development for the internet of
things,” Journal of Systems and Software, vol. 103, pp. 62–84, 2015.
[35] X. T. Nguyen, H. T. Tran, H. Baraki, and K. Geihs, “Frasad: A framework for model-driven
iot application development,” in Internet of Things (WF-IoT), 2015 IEEE 2nd World Fo-
rum on. IEEE, 2015, pp. 387–392.
[36] N. Koshizuka and K. Sakamura, “Ubiquitous id: standards for ubiquitous computing
and the internet of things,” IEEE Pervasive Computing, no. 4, pp. 98–101, 2010.
[37] L. Li, H. Xiaoguang, C. Ke, and H. Ketai, “The applications of wifi-based wireless sensor
network in internet of things and smart grid,” in Industrial Electronics and Applications
(ICIEA), 2011 6th IEEE Conference on. IEEE, 2011, pp. 789–793.
[38] T. Yokotani and Y. Sasaki, “Comparison with http and mqtt on required network re-
sources for iot,” in Control, Electronics, Renewable Energy and Communications (IC-
CEREC), 2016 International Conference on. IEEE, 2016, pp. 1–6.
[39] D. Locke, “Mqtt v3. 1 protocol specification,” International Business Machines Corpora-
tion (IBM) and Eurotech, p. 42, 2010.
[40] A. Stanford-Clark and H. L. Truong, “Mqtt for sensor networks (mqtt-sn) protocol spec-
ification,” International business machines (IBM) Corporation version, vol. 1, 2013.
[41] H. Suo, J. Wan, C. Zou, and J. Liu, “Security in the internet of things: a review,” in Com-
puter Science and Electronics Engineering (ICCSEE), 2012 international conference on,
vol. 3. IEEE, 2012, pp. 648–651.
[42] G. Kortuem, F. Kawsar, V. Sundramoorthy, and D. Fitton, “Smart objects as building
blocks for the internet of things,” IEEE Internet Computing, vol. 14, no. 1, pp. 44–51,
2010.
[43] J. Rivera and R. van der Meulen, “Gartner says the internet of things installed base will
grow to 26 billion units by 2020,” Stamford, conn., December, vol. 12, 2013.

98
[44] J. F. Kurose and K. W. Ross, Computer networking: a top-down approach. Addison
Wesley Boston, 2009, vol. 4.
[45] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson, R. Sparks, M. Han-
dley, and E. Schooler, “Sip: session initiation protocol,” Tech. Rep., 2002.
[46] Multiple packetization times in the session description protocol problem statement
requirements and solution. [Online]. Available: https://tools.ietf.org/id/draft-garcia-
mmusic-multiple-ptimes-problem-03.html
[47] Webrtc network address translation. [Online]. Available: https://tools.ietf.org/id/draft-
cbran-rtcweb-nat-02.html
[48] Introduction to http/2. [Online]. Available: https://developers.google.com/web/
fundamentals/performance/http2/
[49] I. Grigorik, High Performance Browser Networking: What every web developer should
know about networking and web performance. " O’Reilly Media, Inc.", 2013.
[50] Evolution of web protocols http 2 and quic. [Online]. Available: https://www.callstats.
io/2017/02/03/web-protocols-http2-quic/
[51] Quic as a multiplexed stream transport over udp. [Online]. Available: https:
//www.chromium.org/quic
[52] Webrtc with quic presentation at boston meetup. [Online]. Available: https:
//www.youtube.com/watch?v=mIvyOFu1c1Q
[53] Webrtc data channels. [Online]. Available: https://tools.ietf.org/html/draft-ietf-
rtcweb-data-channel-13
[54] A comparison between sctp and quic. [Online]. Available: https://tools.ietf.org/html/
draft-joseph-quic-comparison-quic-sctp-00
[55] Quic api for webrtc. [Online]. Available: https://w3c.github.io/webrtc-quic/#quic-
transport*
[56] I. Fette and A. Melnikov, “The websocket protocol,” Tech. Rep., 2011.
[57] N. Regola and J.-C. Ducom, “Recommendations for virtualization technologies in high
performance computing,” in Cloud Computing Technology and Science (CloudCom),
2010 IEEE Second International Conference on. IEEE, 2010, pp. 409–416.
[58] University of waterloo - publish subscribe pattern. [Online]. Available: https://www.
student.cs.uwaterloo.ca/~cs446/1171/Arch_Design_Activity/PublishSubscribe.pdf
[59] Wikipedia - publish subscribe pattern. [Online]. Available: https:
//en.wikipedia.org/wiki/Publishbegingroupletrelaxrelaxendgroup[Pleaseinsert
PrerenderUnicode{âĂŞ}intopreamble]subscribe_pattern#Loose_coupling

99
[60] aiortc - python library for webrtc. [Online]. Available: https://aiortc.readthedocs.io/
en/latest/index.html
[61] pyav - python wrapper for ffmpeg. [Online]. Available: https://docs.mikeboers.com/
pyav/develop/index.html
[62] Ffmpeg - crossplatform framework to work with media stream. [Online]. Available:
http://ffmpeg.org
[63] M. Cantelon, M. Harter, T. Holowaychuk, and N. Rajlich, Node. js in Action. Manning
Greenwich, 2014.
[64] Express - official homepage for express. [Online]. Available: https://expressjs.com
[65] Request - official github page for request. [Online]. Available: https://github.com/
request/request
[66] Socket.io - official homepage for socket.io. [Online]. Available: https://socket.io
[67] Mdn web api official. [Online]. Available: https://developer.mozilla.org/kab/docs/
Web/API
[68] K. Araujo, R. Best, D. Heitmueller, and D. Tikhonov, “Network access using reverse
proxy,” Nov. 24 2005, uS Patent App. 11/078,001.
[69] Traefik reverse proxy official page. [Online]. Available: https://traefik.io
[70] Google chromium repository. [Online]. Available: https://webrtc.googlesource.com/
src/+/65fc62e9dd8a8716db625aaef76ab92f542ecc5a/webrtc/p2p

101
A Appendix
In this chapter, we provide the repository containing all the programming implementations
as well as the documentation, burned inside the attached compact disk.

103
Eidesstattliche Erklärung
Ich erkläre hiermit gemäß § 17 Abs. 2 APO, dass ich die vorstehende Masterarbeit selbständig
verfasst und keine anderen als die angegebenen Quellen und Hilfsmittel benutzt habe.
Bamberg, 22.12.2018
M
M....................................
Chandan Sarkar

Evaluation of Real-Time Communication in IoT Services by WebRTC

More Related Content

Similar to Evaluation of Real-Time Communication in IoT Services by WebRTC (20)

Recently uploaded (20)

Evaluation of Real-Time Communication in IoT Services by WebRTC