SlideShare a Scribd company logo
Overlay Networks Toward Information Networking
1st Edition Sasu Tarkoma download
https://ebookbell.com/product/overlay-networks-toward-
information-networking-1st-edition-sasu-tarkoma-2530196
Explore and download more ebooks at ebookbell.com
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Future Networks Services And Management Underlay And Overlay Edge
Applications Slicing Cloud Space Aiml And Quantum Computing 1st
Edition Mehmet Toy
https://ebookbell.com/product/future-networks-services-and-management-
underlay-and-overlay-edge-applications-slicing-cloud-space-aiml-and-
quantum-computing-1st-edition-mehmet-toy-36422234
Currency Overlay A Practical Guide Second Edition 2nd Edition Hai Xin
https://ebookbell.com/product/currency-overlay-a-practical-guide-
second-edition-2nd-edition-hai-xin-34596858
Currency Overlay Neil Record
https://ebookbell.com/product/currency-overlay-neil-record-1438666
Supporting Web Search And Navigation By An Overlay Linking Structure
1st Edition Georg Philipp Rorucker
https://ebookbell.com/product/supporting-web-search-and-navigation-by-
an-overlay-linking-structure-1st-edition-georg-philipp-
rorucker-58311378
Overlap Web And Typography 1st Edition Jana Kemmer Tabea Hartwich
https://ebookbell.com/product/overlap-web-and-typography-1st-edition-
jana-kemmer-tabea-hartwich-145041706
Overly Dramatic Rebecca Cohen
https://ebookbell.com/product/overly-dramatic-rebecca-cohen-48836636
Overlap At Benson
https://ebookbell.com/product/overlap-at-benson-57583468
The Overlap Of Affective And Schizophrenic Spectra 1st Edition Andreas
Marneros
https://ebookbell.com/product/the-overlap-of-affective-and-
schizophrenic-spectra-1st-edition-andreas-marneros-1767470
The Overly Honest Teacher Meredith Essalat
https://ebookbell.com/product/the-overly-honest-teacher-meredith-
essalat-12122984
Overlay Networks Toward Information Networking 1st Edition Sasu Tarkoma
Overlay
Networks
Toward Information Networking
© 2010 Taylor and Francis Group, LLC
OTHER telecommunications BOOKS FROM AUERBACH
Broadband Mobile Multimedia:
Techniques and Applications
Yan Zhang, Shiwen Mao, Laurence T. Yang,
and Thomas M Chen
ISBN: 978-1-4200-5184-1
Carrier Ethernet: Providing the Need for Speed
Gilbert Held
ISBN: 978-1-4200-6039-3
Cognitive Radio Networks
Yang Xiao and Fei Hu
ISBN: 978-1-4200-6420-9
Contemporary Coding Techniques and
Applications for MobileCommunications
Onur Osman and Osman Nuri Ucan
ISBN: 978-1-4200-5461-3
Converging NGN Wireline and Mobile 3G
Networks with IMS: Converging NGN and
3G Mobile
Rebecca Copeland
ISBN: 978-0-8493-9250-4
Cooperative Wireless Communications
Yan Zhang, Hsiao-Hwa Chen, and Mohsen Guizani
ISBN: 978-1-4200-6469-8
Data Scheduling and Transmission Strategies
in Asymmetric Telecommunication
Environments
Abhishek Roy and Navrati Saxena
ISBN: 978-1-4200-4655-7
Encyclopedia of Wireless and Mobile
Communications
Borko Furht
ISBN: 978-1-4200-4326-6
IMS: A New Model for Blending Applications
Mark Wuthnow, Jerry Shih, and Matthew Stafford
ISBN: 978-1-4200-9285-1
The Internet of Things: From RFID to the
Next-Generation Pervasive Networked
Systems
Lu Yan, Yan Zhang, Laurence T. Yang,
and Huansheng Ning
ISBN: 978-1-4200-5281-7
Introduction to Communications
Technologies: A Guide for Non-Engineers,
Second Edition
Stephan Jones, Ron Kovac, and Frank M. Groom
ISBN: 978-1-4200-4684-7
Long Term Evolution: 3GPP LTE Radio
and Cellular Technology
Borko Furht and Syed A. Ahson
ISBN: 978-1-4200-7210-5
MEMS and Nanotechnology-Based Sensors
and Devices for Communications,
Medical and Aerospace Applications
A. R. Jha
ISBN: 978-0-8493-8069-3
Millimeter Wave Technology in Wireless PAN,
LAN, and MAN
Shao-Qiu Xiao and Ming-Tuo Zhou
ISBN: 978-0-8493-8227-7
Mobile Telemedicine: A Computing and
Networking Perspective
Yang Xiao and Hui Chen
ISBN: 978-1-4200-6046-1
Optical Wireless Communications:
IR for Wireless Connectivity
Roberto Ramirez-Iniguez, Sevia M. Idrus,
and Ziran Sun
ISBN: 978-0-8493-7209-4
Satellite Systems Engineering in an
IPv6 Environment
Daniel Minoli
ISBN: 978-1-4200-7868-8
Security in RFID and Sensor Networks
Yan Zhang and Paris Kitsos
ISBN: 978-1-4200-6839-9
Security of Mobile Communications
Noureddine Boudriga
ISBN: 978-0-8493-7941-3
Unlicensed Mobile Access Technology:
Protocols, Architectures, Security,
Standards and Applications
Yan Zhang, Laurence T. Yang, and Jianhua Ma
ISBN: 978-1-4200-5537-5
Value-Added Services for Next Generation
Networks
Thierry Van de Velde
ISBN: 978-0-8493-7318-3
Vehicular Networks: Techniques, Standards,
and Applications
Hassnaa Moustafa and Yan Zhang
ISBN: 978-1-4200-8571-6
WiMAX Network Planning and Optimization
Yan Zhang
ISBN: 978-1-4200-6662-3
Wireless Quality of Service:
Techniques, Standards, and Applications
Maode Ma and Mieso K. Denko
ISBN: 978-1-4200-5130-8
AUERBACH PUBLICATIONS
www.auerbach-publications.com
To Order Call: 1-800-272-7737 • Fax: 1-800-374-3401
E-mail: orders@crcpress.com
© 2010 Taylor and Francis Group, LLC
Overlay
Networks
Sasu Tarkoma
Toward Information Networking
© 2010 Taylor and Francis Group, LLC
Auerbach Publications
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2010 by Taylor and Francis Group, LLC
Auerbach Publications is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Printed in the United States of America on acid-free paper
10 9 8 7 6 5 4 3 2 1
International Standard Book Number-13: 978-1-4398-1373-7 (Ebook-PDF)
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been
made to publish reliable data and information, but the author and publisher cannot assume responsibility for the valid-
ity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright
holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this
form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may
rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or uti-
lized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopy-
ing, microfilming, and recording, or in any information storage or retrieval system, without written permission from the
publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://
www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923,
978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For
organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the Auerbach Web site at
http://www.auerbach-publications.com
© 2010 Taylor and Francis Group, LLC
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
About the Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Overlay Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Properties of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Structure of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Network Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1 Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Firewalls and NATs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Naming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Addressing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18
2.5 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5.2 Interdomain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20
2.5.3 Border Gateway Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.4 Current Challenges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21
2.5.5 Compact Routing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22
2.6 Multicast. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22
2.6.1 Network-layer Multicast. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23
2.6.2 Application-layer Multicast. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24
2.6.3 Chaining TCP Connections for Multicast. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25
2.7 Network Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.7.1 Vivaldi Centralized Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26
2.7.2 Vivaldi Distributed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.7.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.7.4 Triangle Inequality Violation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27
2.8 Network Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.8.1 Routing Algorithm Invariants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.8.2 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.8.3 Shortest Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.8.4 Routing Table Size and Stretch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29
2.8.5 Forwarding Load. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29
2.8.6 Churn. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30
2.8.7 Other Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Properties of Networks and Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1 Data on the Internet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1.1 Video Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1.2 P2P Traffic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35
v
© 2010 Taylor and Francis Group, LLC
vi Contents
3.1.3 Trends in Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Zipf’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.2 Zipf’s Law and the Internet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .37
3.2.3 Implications for P2P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3 Scale-free Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4 Robustness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .39
3.5 Small Worlds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40
4 Unstructured Overlays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Early Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3 Locating Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.4 Napster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.5 Gnutella . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.5.2 Searching the Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.5.3 Efficient Keyword Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.6 Skype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.7 BitTorrent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.7.1 Torrents and Swarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.7.2 Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.7.3 Choking Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.7.4 Antisnubbing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.7.5 End Game. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55
4.7.6 Trackerless Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.7.7 BitTorrent Vulnerabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.7.8 Service Capacity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56
4.7.9 Fluid Models for Performance Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .57
4.8 Cross-ISP BitTorrent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.9 Freenet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .60
4.9.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.9.2 Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.9.3 Identifier keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.9.4 Key-based Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.9.5 Indirect Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.9.6 API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.9.7 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.10 Comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67
5 Foundations of Structured Overlays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 Geometries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .72
5.2.1 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2.2 Hypercubes and Tori . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2.3 Butterflies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.2.4 de Bruijn graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .74
5.2.5 Rings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .75
5.2.6 XOR Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.3 Consistent Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
© 2010 Taylor and Francis Group, LLC
Contents vii
5.4 Distributed Data Structures for Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.4.1 Linear Hashing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .78
5.4.2 SDDS Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.4.3 LH* Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .80
5.4.4 Ninja. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82
6 Distributed Hash Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.2 APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.3 Plaxton’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.3.1 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.3.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.4 Chord. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .89
6.4.1 Joining the Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .90
6.4.2 Leaving the Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.4.3 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.4.4 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.5 Pastry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.5.1 Joining and Leaving the Network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .93
6.5.2 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.5.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.5.4 Bamboo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.6 Koorde. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .96
6.6.1 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.6.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.7 Tapestry. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .97
6.7.1 Joining and Leaving the Network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .98
6.7.2 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.7.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.8 Kademlia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.8.1 Joining and Leaving the Network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .101
6.8.2 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.8.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.9 Content Addressable Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.9.1 Joining the Network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .103
6.9.2 Leaving the Network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .104
6.9.3 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.9.4 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.10 Viceroy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.10.1 Joining the Network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .106
6.10.2 Leaving the Network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .107
6.10.3 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.10.4 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.11 Skip Graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .108
6.12 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.12.1 Geometries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .110
6.12.2 Routing Function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .110
6.12.3 Churn. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .110
6.12.4 Asymptotic Trade-offs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.12.5 Network Proximity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .112
6.12.6 Adding Hierarchy to DHTs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
© 2010 Taylor and Francis Group, LLC
viii Contents
6.12.7 Experimenting with Overlays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.12.8 Criticism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
7 Probabilistic Algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .115
7.1 Overview of Bloom Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .115
7.2 Bloom Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
7.2.1 False Positive Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
7.2.2 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
7.2.3 d-left Counting Bloom Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .120
7.2.4 Compressed Bloom Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .121
7.2.5 Counting Bloom Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .121
7.2.6 Hierarchical Bloom Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.2.7 Spectral Bloom Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
7.2.8 Bloomier Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
7.2.9 Approximate State Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
7.2.10 Perfect Hashing Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
7.2.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
7.3 Bloom Filters in Distributed Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.3.1 Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.3.2 P2P Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .128
7.3.3 Packet Routing and Forwarding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.3.4 Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.4 Gossip Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
7.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
7.4.2 Design Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7.4.3 Basic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7.4.4 Basic Shuffling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.4.5 Enhanced Shuffling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.4.6 Flow Control and Fairness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .135
7.4.7 Gossip for Structured Overlays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
8 Content-based Networking and Publish/Subscribe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
8.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
8.2 DHT-based Data-centric Communications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
8.2.1 Scribe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
8.2.2 Bayeux. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .139
8.2.3 SplitStream. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .139
8.2.4 Overcast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
8.2.5 Meghdoot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .141
8.2.6 MEDYM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
8.2.7 Internet Indirection Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
8.2.8 Data-oriented Network Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .141
8.2.9 Semantic Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
8.2.10 Distributed Segment Tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .142
8.2.11 Semantic Queries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .143
8.3 Content-based Routing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .144
8.4 Router Configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .145
8.4.1 Basic Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
8.4.2 Structured DHT-based Overlays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
8.4.3 Interest Propagation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .147
8.5 Siena and Routing Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
© 2010 Taylor and Francis Group, LLC
Contents ix
8.5.1 Routing Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
8.5.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
8.5.3 Siena Filters Poset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
8.5.4 Advertisements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
8.5.5 Poset-derived Forest. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .152
8.5.6 Filter Merging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .154
8.6 Hermes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
8.7 Formal Specification of Content-based Routing Systems . . . . . . . . . . . . . . . . . . . . . 158
8.7.1 Valid Routing Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .158
8.7.2 Weakly Valid Routing Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .159
8.7.3 Mobility-Safety. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .159
8.8 Pub/sub Mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
9 Security. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .165
9.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
9.2 Attacks and Threats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
9.2.1 Worms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
9.2.2 Sybil Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
9.2.3 Eclipse Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
9.2.4 File Poisoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
9.2.5 Man-in-the-Middle Attack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .168
9.2.6 DoS Attack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .168
9.3 Securing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
9.3.1 Self-Certifying Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
9.3.2 Merkle Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
9.3.3 Information Dispersal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
9.3.4 Secret-sharing Schemes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .171
9.3.5 Smartcards for Bootstrapping Trust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
9.3.6 Distributed Steganographic File Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .172
9.3.7 Erasure Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
9.3.8 Censorship Resistance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .173
9.4 Security Issues in P2P Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .174
9.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
9.4.2 Insider Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
9.4.3 Outsider Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
9.4.4 SybilGuard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .177
9.4.5 Reputation Management with EigenTrust. . . . . . . . . . . . . . . . . . . . . . . . . . . . .178
9.5 Anonymous Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
9.5.1 Mixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
9.5.2 Onion Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
9.5.3 Tor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
9.5.4 P2P Anonymization System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
9.5.5 Censorship-resistant Lookup: Achord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
9.5.6 Crowds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
9.5.7 Hordes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .184
9.5.8 Mist. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .185
9.6 Security Issues in Pub/Sub Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
9.6.1 Hermes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
9.6.2 EventGuard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
9.6.3 QUIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
© 2010 Taylor and Francis Group, LLC
x Contents
10 Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .189
10.1 Amazon Dynamo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
10.1.1 Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .191
10.1.2 Ring Membership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
10.1.3 Partitioning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
10.1.4 Replication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .194
10.1.5 Data Versioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
10.1.6 Vector Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
10.1.7 Coping with Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
10.2 Overlay Video Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
10.2.1 Live Streaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
10.2.2 Video-on-Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
10.3 SIP and P2PSIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
10.4 CDN Solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .203
10.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
10.4.2 Akamai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
10.4.3 Limelight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
10.4.4 Coral. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .208
10.4.5 Comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .211
11 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .217
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
© 2010 Taylor and Francis Group, LLC
Preface
Data and media delivery have become hugely popular on the Internet, with well over
1 billion Internet users. Therefore scalable and flexible information dissemination solutions
are needed. Much of the current development pertaining to services and service delivery
happens above the basic network layer and the TCP/IP protocol suite because of the need
to be able to rapidly develop and deploy them.
In recent years, various kinds of overlay networking technologies have emerged as an
active area of research and development. Overlay systems, especially peer-to-peer systems,
aretechnologiesthatcansolveproblemsinmassiveinformationdistributionandprocessing
tasks. The key aim of many of these technologies is to be able to offer deployable solution
for processing and distributing vast amounts of information, typically petabytes and more,
while at the same time keeping the scaling costs low.
The aim of this book is to present the state of the art in overlay technologies, examine
the key structures and algorithms used in overlay networks, and discuss their applications.
Overlay networks have been a very active area of research and development during the
last 10 years, and a substantial amount of scientific literature has formed around this topic.
This book has been inspired by the teaching notes and articles of the author in content-
based routing. The book is designed not only as a reference for overlay technologies, but also
as a textbook for a course in distributed overlay technologies and information networking
at the graduate level.
xi
© 2010 Taylor and Francis Group, LLC
© 2010 Taylor and Francis Group, LLC
About the Author
Sasu Tarkoma received his M.Sc. and Ph.D. degrees in Computer Science from the Uni-
versity of Helsinki, Department of Computer Science. He is currently professor at Helsinki
University of Technology, Department of Computer Science and Engineering. He has been
recently appointed as full professor at University of Helsinki, Department of Computer
Science. He has managed and participated in national and international research projects
at the University of Helsinki, Helsinki University of Technology, and Helsinki Institute for
Information Technology (HIIT). He has worked in the IT industry as a consultant and chief
system architect, and he is principal member of research staff at Nokia Research Center. He
has over 100 publications, and has also contributed to several books on mobile middleware.
Ms. Nelli Tarkoma produced most of the diagrams used in this book.
xiii
© 2010 Taylor and Francis Group, LLC
© 2010 Taylor and Francis Group, LLC
1
Introduction
1.1 Overview
In recent years, various kinds of overlay networking technologies have emerged as an
active area of research and development. Overlay systems, especially peer-to-peer (P2P)
systems, are technologies that can solve problems in massive information distribution and
processing tasks. The key aim of many of these technologies is to be able to offer deployable
solution for processing and distributing vast amounts of information, typically petabytes
and more, while at the same time keeping the scaling costs low.
Data and media delivery have become hugely popular on the Internet. Currently there
are over 1.4 billion Internet users, well over 3 billion mobile phones, and 4 billion mobile
subscriptions. By 2000 the Google index reached the 1 billion indexed web resources mark,
and by 2008 it reached the trillion mark.
Multimedia content, especially videos, are paving the way for truly versatile network
services that both compete with and extend existing broadcast-based medias. As a conse-
quence, new kinds of social collaboration and advertisement mechanisms are being intro-
duced both in the fixed Internet and also in the mobile world. This trend is heightened
by the ubiquitous nature of digital cameras. Indeed, this has created a lot of interest in
community-based services, in which users create their own content and make it available
to others.
These developments have had a profound impact on network requirements and perfor-
mance. Video delivery has become one of the recent services on the Web with the advent
of YouTube [67] and other social media Web sites. Moreover, the network impact is height-
ened by various P2P services. Estimates of P2P share of network traffic range from 50% to
70%. Cisco’s latest traffic forecast for 2009–2013 indicates that annual global IP traffic will
reach 667 exabytes in 2013, two-thirds of a zettabyte [79]. An exabyte (EB) is an SI unit of
information, and 1 EB equals 1018
bytes. Exabyte is followed by the zettabyte (1 Z = 1021
)
and yottabyte (1 Y = 1024
). The traffic is expected to increase some 40% each year. Much
of this increase comes from the delivery of video data in various forms. Video delivery on
the Internet will see a huge increase, and the volume of video delivery in 2013 is expected
to be 700 times the capacity of the US Internet backbone in 2000. The study anticipates that
video traffic will account for 91% of all consumer traffic in 2013.
According to the study, P2P traffic will continue to grow but will become a smaller
component of Internet traffic in terms of its current share. The current P2P systems in 2009
are transferring 3.3 EB data per month. The recent study indicates that the P2P share of
consumer Internet traffic will drop to 20% by 2013, down from the current 50% (at the end
of 2008). Even though the P2P share may drop, most video delivery solutions, accounting
for much of the traffic increase, will utilize overlay technologies, which makes this area
crucial for ensuring efficient and scalable services.
1
© 2010 Taylor and Francis Group, LLC
2 Overlay Networks: Toward Information Networking
A P2P network consists of nodes that cooperate in order to provide services to
each other. A pure P2P network consists of equal peers that are simultaneously
clients and servers. The P2P model differs from the client-server model, where
clients access services provided by logically centralized servers.
To date, P2P delivery has not been successfully combined with browser-based operation
and media sites such as YouTube. Nevertheless, a number of businesses have realized
the importance of scalable data delivery. For example, the game company Blizzard uses
P2P technology to distribute patches for the World of Warcraft game. Given the heavy
use of network, P2P protocols such as BitTorrent offer to reduce network load by peer-
assisted data delivery. This means that peer users cooperate to transfer large files over the
network.
1.2 Overlay Technology
Data structures and algorithms are central for today’s data communications. We may con-
sider circuit switching technology as an example of how information processing algorithms
are vital for products and how innovation changes markets. Early telephone systems were
based on manual circuit switching. Everything was done using human hands. Later systems
used electromechanical devices to connect calls, but they required laborious preconfigu-
ration of telephone numbers and had limited scalability. Modern digital circuit switching
algorithms evolved from these older semiautomatic systems and optimize the number of
connections in a switch. The nonblocking minimal spanning tree algorithm enabled the
optimization of these automatic switches. Any algorithm used to connect millions of calls
must be proven to be correct and efficient. The latest development changes the fundamen-
tals of telephone switching, because information is forwarded as packets on a hop-by-hop
basis and not via preestablished physical circuits. Today, this complex machinery enables
end-to-end connectivity irrespective of time and location.
Data structures are at the heart of the Internet. Network-level routers use efficient algo-
rithms for matching data packets to outgoing interfaces based on prefixes. Internet back-
bone routers have to manage 200,000 routes and more in order to route packets between
systems. The matching algorithms include suffix trees and ternary content addressable memo-
ries (TCAMs) [268], which have to balance between matching efficiency and router memory.
Therefore, just as with telephone switches, optimization plays a major role in the develop-
ment of routers and routing systems.
The current generation of networks is being developed on top of TCP/IPs network-layer
(layer 3 in the open systems interconnection (OSI) stack). These so-called overlay networks
come in various shapes and forms. Overlays make many implementation issues easier,
because network-level routers do not need to be changed. In many ways, overlay networks
represent a fundamental paradigm shift compared to older technologies such as circuit
switching and hierarchical routing.
Overlay networks are useful both in control and content plane scenarios. This division of
traffic into control and content is typical of current telecommunications solutions such as the
session initiation protocol (SIP); however, this division does not exist on the current Internet as
such. As control plane elements, overlays can be used to route control messages and connect
different entities. As content plane elements, they can participate in data forwarding and
dissemination.
© 2010 Taylor and Francis Group, LLC
Introduction 3
An overlay network is a network that is built on top of an existing network. The
overlay therefore relies on the so-called underlay network for basic networking
functions, namely routing and forwarding. Today, most overlay networks are built
in the application layer on top of the TCP/IP networking suite. Overlay technolo-
gies can be used to overcome some of the limitations of the underlay, at the same
time offering new routing and forwarding features without changing the routers.
The nodes in an overlay network are connected via logical links that can span
many physical links. A link between two overlay nodes may take several hops in
the underlying network.
An overlay network therefore consists of a set of distributed nodes, typically client de-
vices or servers, that are deployed on the Internet. The nodes are expected to meet the
following requirements:
1. Support the execution of one or more distributed applications by providing infra-
structure for them.
2. Participate in and support high-level routing and forwarding tasks. The overlay is
expected to provide data-forwarding capabilities that are different from those that
are part of the basic Internet.
3. Deploy across the Internet in such a way that third parties can participate in the
organization and operation of the overlay network.
Figure 1.1 presents a layered view to overlay networks. The view starts from the underlay,
the network that offers the basic primitives of sending and receiving messages (packets).
The two obvious choices today are UDP and TCP as the transport layer protocols. TCP is
favored due to its connection-oriented nature, congestion control, and reliability.
After the underlay layer, we have the custom routing, forwarding, rendezvous, and
discovery functions of the overlay architecture. Routing pertains to the process of building
and maintaining routing tables. Forwarding is the process of sending messages toward their
destination, and rendezvous is a function that is used to resolve issues regarding some
identifier or node—for example, by offering indirection support in the case of mobility.
Discovery is an integral part of this layer and is needed to populate the routing table by
discovering both physically and logically nearby neighbors.
Security and resource management,
reliability, fault tolerance
Services management
Applications, services, tools
Routing, forwarding,
rendezvous, discovery
Network
FIGURE 1.1
Layered view to overlay networks.
© 2010 Taylor and Francis Group, LLC
4 Overlay Networks: Toward Information Networking
The next layer introduces additional functions, such as security and resource manage-
ment, reliability support, and fault tolerance. These are typically built on top of the basic
overlay functions mentioned above. Security pertains to the way node identities are as-
signed and controlled, and messages and packets are secured. Security encompasses mul-
tiple protocol layers and is responsible for ensuring that peers can maintain sufficient level
of trust toward the system. Resource management is about taking content demand and
supply into account and ensuring that certain performance and reliability requirements are
met. For example, relevant issues are data placement and replication rate. Data replication
is also a basic mechanism for ensuring fault-tolerance. If one node fails, another can take
its place and, given that the data was replicated, there is no loss of information.
Above this layer, we have the services management for both monitoring and controlling
service lifecycles. When a service is deployed on top of an overlay, there need to be functions
for administering it and controlling various issues such as administrative boundaries, and
data replication and access control policies.
Finally, in the topmost layer we have the actual applications and services that are executed
on top of the layered overlay architecture. The applications rely on the overlay architecture
for scalable and resilient data discovery and exchange.
An overlay network offers a number of advantages over both centralized solutions
and solutions that introduce changes in routers. These include the following three key
advantages:
Incremental deployment: Overlay networks do not require changes to the existing
routers. This means that an overlay network can be grown node by node, and
with more nodes it is possible to both monitor and control routing paths across the
Internet from one overlay node to another. An overlay network can be built based
on standard network protocols and existing APIs—for example, the Sockets API of
the TCP/IP protocol stack.
Adaptable: The overlay algorithm can utilize a number of metrics when making rout-
ing and forwarding decisions. Thus the overlay can take application-specific con-
cerns into account that are not currently offered by the Internet infrastructure. Key
metrics include latency, bandwidth, and security.
Robust: An overlay network is robust to node and network failures due to its adaptable
nature. With a sufficient number of nodes in the overlay, the network may be able
to offer multiple independent (router-disjoint) paths to the same destination. At
best, overlay networks are able to route around faults.
The designers of an early overlay system called resilient overlay network (RON) [361] used
the idea of alternative paths to improve performance and to route around network faults.
Figure 1.2 illustrates how overlay technology can be used to route around faults. In this
example, there is a problem with the normal path between A and B across the Internet.
Now, the overlay can use a so-called detour path through C to send traffic to B. This will
result in some networking overhead but can be used to maintain communications between
A and B.
Overlay networks face also a number of challenges and limitations. The three central
challenges include the following:
• The real world: In practice, the typical underlay protocol, IP, does not provide uni-
versal end-to-end connectivity due to the ubiquitous nature of firewalls and network
address translation (NAT) devices. This means that special solutions are needed to
overcome reachability issues. In addition, many overlay networks are oblivious to
the current organizational and management structures that exist in applications
© 2010 Taylor and Francis Group, LLC
Introduction 5
A
Normal path
Route around
the problem
Logical links
X
Internet
B
C
FIGURE 1.2
Improving resiliency using overlay techniques.
and also in network designs. For example, most of the overlay solutions presented
in this book do not take Internet topology into account from the viewpoint of the
autonomous systems (ASs) and inter-AS traffic.
• Management and administration: Practical deployment requires that the overlay
network have a management interface. This is relatively easy to realize for a single
administrative domain; however, when there are many parties involved, the man-
agement of the overlay becomes nontrivial. Indeed, at the moment most overlays
involve a single administrative domain.
The administrator of an overlay network is typically removed from the actual
physical devices that participate in the overlay. This requires advanced techniques
for detecting failed nodes or nodes that exhibit suspect behaviors.
• Overhead: An overlay network typically consists of a heterogeneous body of de-
vices across the Internet. It is clear that the overlay network cannot be as efficient
as the dedicated routers in processing packets and messages. Moreover, the over-
lay network may not have adequate information about the Internet topology to
properly optimize routing processes.
Figure 1.3 presents a taxonomy of overlay systems. Overlays can be router-based, or
they can be completely implemented on top of the underlay, typically TCP/IP. Router-
based overlays typically employ IP Multicast [107, 130] and IP Anycast [106] features;
however, given the fact that deployment of the next version of the IP protocol, IPv6 [106],
has not progressed according to most optimistic expectations, these extensions are not
Router-based
(IP multicast)
No router support
Infrastructure-centric
(CDNs)
End-systems only
End-Systems with
infrastructure support
Overlay multicast
FIGURE 1.3
Taxonomy of overlay networks.
© 2010 Taylor and Francis Group, LLC
6 Overlay Networks: Toward Information Networking
globally supported on the Internet. If the routers only provide basic unicast end-to-end
communication, information networking functions need to be provided by the overlay.
Content delivery networks (CDNs) are examples of overlay networks that cache and
storecontentandallowefficientandlesscostlywaystodistributedataonamassive
scale. CDNs typically do not require changes to end-systems, and they are not P2P
solutions from the viewpoint of the end clients.
The two remaining categories illustrated in Figure 1.3 are end-systems with and without
infrastructure support, respectively. The former combines fixed infrastructure with soft-
ware running in the end-systems in order to realize efficient data distribution. The latter
category does not involve fixed infrastructure, but rather establishes the overlay network
in a decentralized manner.
Overlay networks allow the introduction of more complex networking functionality on
top of the basic IP routing functionality. For example, filter-based routing, onion routing,
distributed hash tables (DHTs), and trigger-based forwarding are examples of new kinds of
communication paradigms. DHTs are a class of decentralized distributed algorithms that
offer a lookup service. DHTs store (key, value) pairs, and they support the lookup of the
value associated with a given key. The keys and values are distributed in the system, and
the DHT system must ensure that the nodes have sufficient information of the global state
to be able to forward and process lookup requests properly.
The DHT algorithm is responsible for distributing the keys and values in such a way that
efficient lookup of the value corresponding to a key becomes possible. Since peer nodes
may come and go, this requires that the algorithm be able to cope with changes in the
distributed system. In addition, the locality of data plays an important part in all overlays,
since they are executed on top of an existing network, typically the Internet. The overlay
should take the network locations of the peers into account when deciding where data is
stored, and where messages are sent, in order to minimize networking overhead.
Figure 1.4 illustrates the key DHT API functions that allow peers to insert, look up,
and remove values associated with a key. Typically, the key is a hash value, so-called flat
label, which realizes essentially a flat namespace that can be used by the DHT algorithm to
optimize processing.
DHTs are a class of decentralized distributed systems. They provide a logically
centralized lookup service similar to hash tables. A DHT stores (key, value) pairs
and allows a client to retrieve a value associated with a given key. The DHT is
typically realized as a structured P2P network in which peers cooperate to provide
the service across the Internet.
Distributed applications
Node Node Node Node
Distributed hash table (DHT)
put(key, value) delete(key, value)
value
get(key)
DHT balances keys and data across nodes
DHT API
FIGURE 1.4
DHT API.
© 2010 Taylor and Francis Group, LLC
Introduction 7
There are two main classes of P2P networks, structured and unstructured. In the former
type, the overlay network topology is tightly controlled by the P2P system and content is
distributed in such a way that queries can be made efficiently. Many structured P2P systems
utilize DHT algorithms in order to map object identifiers to distributed nodes. Unstructured
P2P networks do not have such tightly controlled structure, but rather they utilize flooding
and similar opportunistic techniques, such as random walks and expanding-ring time-to-live
(TTL) search, for finding peers that host interesting data. Each peer receiving a query can
then evaluate the query locally using its own content. This allows unstructured P2P systems
to support more complex queries than are typically supported by structured DHT-based
systems.
Unstructured P2P algorithms are called first generation and the structured algorithms are
called second generation. They can also be combined to create hybrid systems. The key-based
structured algorithms have a desirable property: namely, that they can find data locations
within a bounded number of overlay hops [162]. The unstructured broadcasting-based
algorithms, although resilient to network problems, may have large routing costs due to
flooding, or may be unable to find available content [274].
Another approach to P2P systems is to divide them into two classes, pure and hybrid P2P
systems. In the former, each peer is simultanously a client and a server, and the operation
is decentralized. In the latter class, a centralized component is used to support the P2P
network.
Figure 1.5 illustrates the inherent trade-off between completeness and expressiveness of
an overlay system. By completeness we mean the ability of the system to guarantee the
location and retrieval of a piece of data. Expressiveness pertains to the system’s ability
to reason about the data—for example, how complex queries can be used to locate data
elements. DHTs and other structured overlays typically guarantee completeness, whereas
unstructured systems, such as Gnutella and Freenet, do not provide such guarantees. As an
inherent limitation, structured systems support less complex queries, typically the lookup
of keys. Unstructured systems, on the other hand, can support complex query processing.
In this book, we cover both structured and unstructured systems and highlight their key
properties.
Key-
based
routing
Key-based
range
queries
Attribute-
based
queries
Expressiveness
Completeness
No guarantees
Guarantees
Content-
based
routing
DHT
Hybrid
Unstructured
FIGURE 1.5
Balancing completeness and expressiveness in overlays.
© 2010 Taylor and Francis Group, LLC
8 Overlay Networks: Toward Information Networking
1.3 Applications
Many overlay networks have been proposed both in the research community and by Inter-
netandWebcompanies.Overlaynetworkscanbecategorizedintothefollowingclasses[80]:
• P2P file sharing: For sharing media and data. For example, Napster, Gnutella,
KaZaA.
• CDN: Content caching to reduce delay and cost. For example, Akamai and Lime-
Light.
• Routing and forwarding: Reduce routing delays and cost, resiliency, flexibility. For
example, resilient overlay network (RON), Internet indirection infrastructure (i3).
• Security: To enhance end-user security and offer privacy protection. For exam-
ple, virtual private networks (VPNs), onion routing, anonymous content storage,
censorship resistant overlays.
• Experimental: Offer testing ground for experimenting with new technologies. For
example, PlanetLab.
• Other: Offer enhanced communications. For example, e-mail, VoIP, multicast,
publish/subscribe, delay tolerant operation, etc.
Currently a significant amount of content is being served using decentralized P2P over-
lays. Most of the deployed algorithms are based on unstructured overlays. The unstruc-
tured P2P protocol BitTorrent has become a popular content distribution protocol over the
recent years.
P2P technologies are not commonly used with CDNs; however, they are increasingly
used by end clients. P2P offers end client–assisted data distribution, in which clients acting
as peers upload data. This contrasts with the traditional client-server CDN model, in which
clients do not upload data. The main strength of P2P is in the delivery of massively popular
data items; however, items that fall into the long tail may not be cost-efficient to distribute
using P2P. This can be alleviated by storing data items on client machines using caching,
but this requirement is not favored by many users.
1.4 Properties of Data
In this section, we briefly discuss the properties of data [117, 120, 228]. Data can be charac-
terized in many ways. We consider an example taxonomy in Figure 1.6 that divides data
into two parts: stored data and real-time data.
Stored data consists of bits that are stored on a system on a more permanent basis in such a
way that the data can be made available later. This data can take two forms: it can be mutable
or immutable. Mutable data can be shared and modified by various entities either locally or
in the distributed environment. Mutable data can be made incrementally available, and it
canbecreatedandmanagedbymultipleentities.Ontheotherhand,mutabledataisnoteasy
to cache and it requires complicated security solutions, especially in distributed environ-
ments. Immutable data means that the full data—for example, a picture or a video file—is
available, and it does not change. This data can therefore be cached and verified easily.
Real-time data is generated on the fly and transmitted over the network. The data is pack-
etized, possibly on multiple layers, and it is transferred hop-by-hop on a store-and-forward
basis. This means that, although individual packets of the data are stored in intermediate
© 2010 Taylor and Francis Group, LLC
Introduction 9
Data
Stored data
Mutable data Immutable data
Real-time data
Streaming Signaling
Data sharing Data only
incrementally available
Data only
incrementally available
Secure operation needs
solutions
Not easy to cache
Data incrementally
available
Full data is
available
Easy to cache Cannot be cached Cannot be cached
Easy to varify
FIGURE 1.6
Taxonomy of data.
buffers, the whole data is not stored as such. In addition, with real-time data, the time when
the data is inserted into the network plays a crucial part.
Streaming data is only incrementally available, and only the latest packets of this stream
are important. This means that this kind of data cannot be cached. Another form of real-
time data is signaling. In this case, data also becomes incrementally available and cannot
be cached; however, the data packets are typically very different from streaming.
References play an important part in distributed systems. A reference encapsulates a re-
lationship between itself and a referent defined relative to the state of some physical system.
As examples we may consider memory addresses that point to some specific locations of
physical memory and universal resource locators (URLs) that point to Web resources located
on specific servers, available using a specific protocol such as the hypertext transfer proto-
col (HTTP). If the physical system changes—for example, memory is swapped or a server
is relocated—the referent changes as well. These so-called physical references may become
invalid when the environment changes.
In order to cope with changes in the environment, the common practice is to introduce
a level of indirection into the reference system. For example, the domain name system (DNS)
binds host names to IP addresses, which allows administrators to change IP addresses
without changing host names. The hierarchical and replicated structure of DNS scales well
for its intended purposes, and it is at the core of the Internet.
A data element can be either mutable or immutable. In the former case it can change,
and in the latter case it cannot change. It is obvious that a mutable data element can be
represented by a sequence (or a graph) of immutable data elements. Given that a piece of
data does not change, it can be uniquely and succinctly summarized using a hash function.
We note that hashes only provide probabilistic uniqueness; however, a long enough hash
bitstring results in a vanishingly small probability of collision.
A hash function is a function from a sequence of bytes to a fixed size sequence of bits, a
bitstring.Hashfunctionscanbecharacterizedbasedonhoweasyitistofindacollision[227]:
• A hash function is strongly collision resistant if it is not computationally feasible to
find two different input data items which have the same hash.
• A hash function is weakly collision resistant if, for a given data item, it is computa-
tionally not feasible to find another data item that has the same hash.
© 2010 Taylor and Francis Group, LLC
10 Overlay Networks: Toward Information Networking
• A hash function is probabilistically collision resistant if, for a given input data item,
the probability that a randomly chosen data item will have the same hash as the
input data item is extremely small.
Semantic-free references have been proposed to achieve persistence and freedom from
contention in a naming system [20, 339]. The idea is to use a reference namespace devoid of
explicit semantics—for example, based on hashed identifiers. This means that a reference
should not contain information about the organization, administrative domain, or network
provider. Flat semantic-free references contrast with DNS-based URLs because they have
no explicit structure. The semantic-free referencing method uses DHTs to map each object
reference to a machine that contains object metadata. The metadata typically includes the
object’s current network location and other information.
Until recently, there have been no good candidate solutions for resolving semantic-free
names in a scalable fashion in the distributed environment. The traditional solution has
been to use a partitioned set of context-specific name resolvers. The emerging overlay DHT
technology can be used to efficiently store and look up semantic-free references. Indeed,
the so-called self-certified flat labels have gained widespread adoption in recent overlay
systems.
Self-certifying data is data whose integrity can be verified by the client accessing
it [227]. A node inserting a file in the network or sending a packet calculates a
cryptographic hash of the content using a known hash function. This hashing
produces a file key that is included in the data. The node may also sign the hash
value with its private key and include its public key with the data. This additional
process allows other nodes to authenticate the original source of the data. When a
node retrieves the data using the hash of the data as the key, it calculates the same
hash function to verify that the data integrity has not been compromised.
A large part of the research and development on P2P systems has focused on data-
centric operation, which emphasizes the properties of the data instead of the location of
the data. Ideally, the clients of the distributed system are not interested in where a par-
ticular data item is obtained as long as the data is correct. The notion of data-centricity
allows the implementation of various dynamic data discovery, routing, and forwarding
mechanisms [274].
In content-based routing systems, hosts subscribe to content by specifying filters on mes-
sages. In content-based routing, the content of messages defines their ultimate destination in
the distributed system. Information subscribers use an interest registration facility provided
by the network to set up and tear down data delivery paths. Data-centric and content-based
communications are currently being investigated as possible candidates for Internet-wide
communications.
1.5 Structure of the Book
After the introduction chapter that motivates overlay technology and outlines several ap-
plication scenarios, we start with an overview of networking technology in Chapter 2. This
chapter briefly examines the TCP/IP protocol suite and the basics of networking, such as
naming, addressing, routing, and multicast. The chapter forms the basis for the follow-
ing chapters, because typically TCP/IP is the underlay of the overlay networks and thus
© 2010 Taylor and Francis Group, LLC
Introduction 11
understanding its features and properties is vital to the development of efficient overlay
solutions.
We discuss properties of networks in Chapter 3, including the growth of the Internet,
trends in networking, and how data can be modeled. Many of the overlay algorithms are
based on the observation that networks exhibit power law degree distributions. This can
then be used to create better routing algorithms.
In Chapter 4 we examine a number of unstructured P2P overlay networks. Many of
these solutions can be seen to be part of the first generation of P2P and overlay networks;
however, they can be also combined with structured approaches to form hybrid solutions.
We cover protocols such as Gnutella, BitTorrent, and Freenet and present a comparison of
them. This chapter places special emphasis on BitTorrent, because it has become the most
frequently used P2P protocol.
Chapter 5 presents the foundations of structured overlays. We consider various geome-
tries and their properties that have been used to create DHTs. The chapter also presents
consistent hashing, which is the basis for the scalability of many DHTs. After surveying
the foundations and basic cluster-based solutions, we then examine a number of structured
algorithms in Chapter 6. Structured overlay technologies place more assumptions on the
way nodes are organized in the distributed environment. We analyze algorithms such as
the Plaxton’s algorithm, Chord, Pastry, Tapestry, Kademlia, CAN, Viceroy, Skip Graphs,
and others. The algorithms are based on differing structures, such as hypercubes, rings,
tori, butterflies, and skip graphs. The chapter considers also some advanced issues, such
as adding hierarchy to overlays.
Many P2P protocols and overlay networks utilize probabilistic techniques to reduce
processing and networking costs. Chapter 7 presents a number of frequently used and
useful probabilistic techniques. Bloom filters and their variants are of prime importance,
and they are heavily used in various network solutions. The chapter also examines epi-
demic algorithms and gossiping, which are also the foundation of a number of overlay
solutions.
As observed in this chapter, data-centric and content-centric operation offer new possi-
bilities regarding data caching, replication, and location. Recently, content-based routing
has become an active research area. In Chapter 8 we consider content-centric routing and
examine a number of protocols and algorithms. Special emphasis is placed on distributed
publish/subscribe, in which content is targeted to active subscribers.
Given the scalable and flexible distribution solutions enabled by P2P and overlay tech-
nologies, we are faced with the question of security risks. The authenticity of data and
content needs to be ensured. Required levels of anonymity, availability, and access con-
trol also must be taken into account. Chapter 9 examines the security challenges of P2P
and overlay technologies, and then outlines a number of solutions to mitigate the ex-
amined risks. Issues pertaining to identity, trust, reputation, and incentives need to be
analyzed.
Chapter 10 considers applications of overlay technology. Amazon’s Dynamo is consid-
ered as an example of an overlay system used in production environment that combines a
number of advanced distributed computing techniques. We also consider video-on-demand
(VoD) in this chapter. Much of the expected IP traffic increase in the coming years will come
from the delivery of video data in various forms. Video delivery on the Internet will see
a huge increase, and the volume of video delivery in 2013 is expected to be 700 times the
capacity of the US Internet backbone in 2000. The remainder of the chapter examines P2P
SIP for telecommunications signaling, and content distribution technologies.
Finally, we conclude in Chapter 11 and summarize the current state of the art in overlay
technology and the future trends. The chapter outlines the main usage cases for P2P and
overlay technologies for applications and services.
© 2010 Taylor and Francis Group, LLC
© 2010 Taylor and Francis Group, LLC
2
Network Technologies
This chapter examines the TCP/IP protocol suite and the basics of networking, such as
naming, addressing, routing, and multicast. The chapter forms the basis for the follow-
ing chapters, because typically TCP/IP is the underlay of the overlay networks and thus
understanding its features and properties is vital for the development of efficient overlay
solutions. The chapter places emphasis on interdomain routing, because it is key for scal-
able and policy-compliant global networking. Overlay solutions should ensure that the
underlay is used in an efficient and policy-compliant manner [203].
2.1 Networking
TCP/IP forms the basis of the current Internet, and it is generally described as having four
abstraction layers—namely, the link layer, network layer, transport layer, and application
layer. This layered view is often compared with the seven-layer OSI reference model. Design
principles, outlined in RFC 1122, have had a major influence on the development of the
current Internet [106]. The two key design principles for the Internet were [81] the end-to-end
principle and the robustness principle.
The end-to-end principle places the maintenance of state and overall intelligence at the
edges and assumes the core Internet retains no state [282]. Today’s real-world needs for
firewalls, network address translation (NAT), and web content caches have essentially made
this principle impossible to follow in practice.
The robustness principle can be summarized as follows: be conservative in what you do, be
liberal in what you accept from others. The principle suggests that Internet software developers
carefully write software that adheres closely to extant RFCs but accept and parse input from
clients that might not be consistent with those RFCs. As stated in RFC 1122, adaptability to
change must be designed into all levels of Internet host software.
The network layer in the TCP/IP model is responsible for realizing internetworking and
uses the IP protocol to deliver data from upper layers between end hosts. The protocol suite
separates host names from topological addresses by using name resolution. The domain
name system (DNS) is responsible for resolving hierarchical host names to topological IP
addresses [231]. This effectively separates naming from addressing, and even though the
naming system, namely DNS, fails, the underlying routing can still function independently.
DNS also allows the definition of organizational boundaries that are independent of the
network topology.
A routing algorithm is responsible for building and maintaining routing tables. A forward-
ing algorithm is responsible for determining the next hop given a destination address. Packet
routinginvolvesuseofroutingandforwardingalgorithmsandprotocolsfordecidingwhere
an incoming packet should be sent. The two main classes are intradomain and interdomain
protocols. Intradomain protocols are applied in an autonomous system (AS)—for example,
13
© 2010 Taylor and Francis Group, LLC
14 Overlay Networks: Toward Information Networking
a metropolitan area network (MAN) or regional network—and interdomain protocols are used
to connect the different AS together to form a global network topology. The typical exam-
ples of the protocols are open shortest path first (OSPF) for intradomain operation and border
gateway protocol (BGP) for interdomain operation.
The communications models offered by the Internet can be categorized into the following
cases. In unicasting, a packet traverses a sequence of links from a source to a destination. The
majority of traffic on the Internet is unicast. In multicasting, a packet selectively traverses
multiple chains of links from typically one source to multiple destinations. In broadcasting,
a packet is sent on multiple links to every device on the network. In practice, broadcast
is applied only within a specific broadcast domain. In anycasting, a suitable chain of links
is selected from a number of possible candidates. Packets are sent to the nearest or best
destination.
Of the above communication models, the currently dominant IP version 4 protocol sup-
ports only unicasting on a global scale. The next version of IP, version 6, offers these other
communication models as well; however, the IPv6 deployment has not progressed accord-
ing to some optimistic expectations, and it remains to be seen when the new protocol is
globally deployed.
The Internet is based on hierarchical routing, in which autonomous areas (AS) are
connected by peering and transit links. Each AS can run its own local routing
algorithm, and BGP is used for interdomain connectivity.
Figure 2.1 illustrates the interoperable nature of the IP protocol. The network layer pro-
vides global addressing and end-to-end reachability, and thus abstracts the applications
from the details of routing and forwarding. The IP protocol supports a number of under-
lying links and physical layer protocols, which makes it the waist of the protocol stack.
Higher-level features diverge from the IP and support different operating environments.
The network layer therefore minimizes the number of service interfaces and maximizes
interoperability.
Divergence
Convergence
Diverse physical layers
Diverse applications
Transport layer (TCP/IP)
FIGURE 2.1
Hourglass model in networking.
© 2010 Taylor and Francis Group, LLC
Network Technologies 15
Middleware provides additional services on top of the networking stack and below the
applications. Most overlay and P2P technologies can be thought to be part of middleware.
As middleware, they utilize the APIs and features of the underlying protocol stack and
network and offer their own APIs for application developers. The motivation for this layer
is that it can abstract many details pertaining to the underlying layers and thus make it
easier to develop and run distributed software.
TCP/IP applications use either a host name or an IP address. The former requires a DNS
lookup to resolve the IP address, whereas the latter is directly routable. Recently there have
been a number of proposals for adding further indirection into the protocol architecture
by means of locator-identity split. In general, the split would allow various identifiers—
for example, cryptographic identifiers [14, 188, 243]—to be mapped to IP addresses. The
motivation for locator-identity split is increased flexibility and de-emphasizing the central
role of IP addresses as end-point identifiers.
2.2 Firewalls and NATs
The present-day Internet has seen ubiquitous deployment of firewalls and network address
translators (NATs). Both are used to control data communications between subnetworks.
Firewalls are hardware or software components that block certain incoming connections.
Their main motivation is to increase security by preventing unauthorized connections to
a device. NAT devices, on the other hand, perform conversion between different address
spaces, typically private and public networks (Fig. 2.2). The motivation for NATs is that they
offeracertainlevelofsecurityandallowtheuseofprivateIPaddressspaces,thusalleviating
IP address exhaustion concerns and some network management concerns as well.
A NAT involves the translation of an IP address used within one network to a
different IP address known within another network. Typically, a NAT maps its
local private network addresses to one or more global outside IP addresses and
then performs reverse mapping of the global IP addresses on incoming packets
back into private IP addresses.
Private address A
Private address B
NAT with public address
Inside local IP addr.
A
B
Out IP addr.
Public IP
Public IP
Inside port
1000
1001
Out port
2000
2001
FIGURE 2.2
Example of network address translation.
© 2010 Taylor and Francis Group, LLC
16 Overlay Networks: Toward Information Networking
There are a variety of NAT devices and a variety of network topologies utilizing NAT
devices in deployments. NAT devices support private IP addressing domains that are not
globally reachable. Typically, client-initiated connections create soft state in the NAT devices
so that responses to requests can be sent to the hosts in the private domain.
There are four general types of NAT devices, based on how they perform the address
mapping:
• Full cone NAT maps an internal address to an external address in one-to-one fash-
ion, and it is easy to traverse.
• Restricted cone NAT maps internal address (and port) to an external address. Once
the internal client has sent a packet to an external host, the external host can send
packets back from any port.
• Port-restricted cone NAT is similar to the restricted cone NAT, but the external host
can only send from the port to which it received packets from the internal client.
• In symmetric NAT, only an external host that receives packets from the internal
host can send packets back.
The asymmetric addressing and connectivity domains established by NAT devices have
created unique problems for P2P systems, which realize both client and server functionality
at end nodes. NATs may prevent P2P nodes from receiving inbound requests. Although
P2P systems build on the end-to-end communications capability of the Internet, in practice
the assumption that a peer can receive inbound traffic is often not valid.
A number of techniques have been devised for applications to detect the NATs on the
communication path and then configure the communications in such a way that the con-
nection can be established. The communication options depend on the type of NATs.
The worst case happens when there are symmetric NATs present, which map each out-
going connection to a new IP address and port number. This case is solved by using
a special rendezvous server that relays all packets between the communicating end-
points [302].
IETF has developed a number of NAT traversal solutions that include connection estab-
lishment (STUN), relaying (TURN), and combined solutions for SIP (ICE). The solutions are
surveyed and discussed in RFC 5128 [302]. Relaying is the most reliable method of realizing
NAT traversal; however, it is also the least efficient, because the relay server’s processing
power and network capacity is used to relay packets. Another technique is connection re-
versal for direct communication that works if only one of the two peers is behind a NAT
device. UDP and TCP hole punching can be used to punch holes through NAT devices and
establish direct connectivity between peers even when both hosts are behind NATs. Recent
analysis results indicate that UDP hole punching works widely on more than 80% of the
NAT devices. TCP hole punching is not as frequently supported, with approximately 60%
support.
P2P applications may use multiple rendezvous servers for registration, discovery, and
relay functions. As an example, Skype uses a central public server for login and a num-
ber of different public servers to realize end-to-end relay functionality. Recent studies
based on thousands of BitTorrent swarms indicate that roughly half of the peers can be
behind firewalls [232]. We return to the Skype and BitTorrent protocols in more detail in
Chapter 4.
© 2010 Taylor and Francis Group, LLC
Network Technologies 17
2.3 Naming
Names and namespaces are fundamental components of network architectures. In the cur-
rent Internet, the DNS is responsible for managing the hierarchical domain namespace. The
DNS protocol was specified in the early 1980s by the IETF. Much of the flexibility of the
current Internet stems from the scalability of both network-level hierarchical routing and
the higher level naming service. DNS has facilitated the deployment of the World Wide
Web and e-mail.
DNS is a managed distributed overlay that uses a static distribution tree and a hierarchi-
cally organized namespace. The DNS system is a distributed database system implemented
using the client-server model, in which the nameservers are responsible for the sharing,
replicating, and partitioning the domain namespace, and answering client requests. DNS
achieves scalability and resilience by relying extensively on caching and replication. As
a consequence, updates to DNS records typically require some time to become globally
available. Another limitation of DNS is that it does not have built-in security, which makes
it prone to a number of vulnerabilities.
The client-side uses a DNS resolver to look up information from DNS. DNS uses UDP
for typical requests and TCP for larger transfers. The DNS system supports two different
query modes, namely nonrecursive queries and recursive queries. A nonrecursive query
places the control at the requesting client, and typically a single DNS provides only a
partial answer to the query. The client can then expand the partial answer by using other
nameservers that are identified in the partial answer. A recursive query, on the other hand,
places the control of the resolution process at the nameserver, which will then contact other
nameservers to answer the query. This latter mode is not a mandatory feature.
The namespace consists of domain names that are organized in a tree structure. Each
domain name in this tree has zero or more resource records that contain information about
the name. Each domain name is part of a DNS zone and has one or more authoritative DNS
servers. The root level of the hierarchy is served by the root nameservers, which are used
to look up a top-level domain name (TLD).
A DNS zone consists of a set of nodes served by an authoritative nameserver. Adminis-
trative responsibility of a zone can be divided to multiple nameservers. Moreover, a single
nameserver can be responsible for multiple zones. Authority can be delegated for an arbi-
trary part of a zone, typically in the form of subdomains. In the case of delegation, the new
nameserver will become the authoritative nameserver for the delegated namespace.
The Internet Corporation for Assigned Names and Numbers (ICANN) oversees the reg-
istrar companies that maintain top-level domains. The domain names have a hierarchical
structure, and new hierarchy levels can be assigned under the top-level domains. The DNS
domain hierarchy is independent of network topology and network administrative do-
mains. This means that multiple names can be mapped to the same network and same
physical server. A name can also map to different IP addresses based on some policy, which
is useful in realizing load balancing. The separation of naming and addressing thus provides
flexibility by allowing more fine-grained policies to be implemented.
The DNS service has been designed to accept queries pertaining to host names and IP
addresses. A DNS client can perform a lookup to translate a hostname to an IP address,
translate an IP address to a hostname, and obtain published information about a host
(typically MX record for e-mail SMTP server details).
Figure 2.3 illustrates how DNS is used. When a client needs to obtain information about
a hostname, it sends a query to its local DNS server. The local DNS server consults its own
© 2010 Taylor and Francis Group, LLC
18 Overlay Networks: Toward Information Networking
DNS
name
server
DNS client
(resolver)
root
…
fi com
uk
tkk
cse
helsinki
Root name server
fi name server
tkk.fi name server
cse.tkk.fi name server
6. Answer
1. Resolve host.cse.tkk.fi using recursive query
2. Query
3. Referral
4. Referral
5. Query
FIGURE 2.3
Overview of the domain name system.
cache if it already has the answer to the query. If the cache does not contain the answer, the
local DNS server forwards the query to other DNS servers. Once the DNS server receives
an answer, it can cache it before sending it to the client.
We can take the lookup for cse.tkk.fi as an example. The local DNS server first queries
one of the public root nameservers to find the machines that are nameservers for the .fi
domain. Then the local DNS server queries the .fi domain nameservers to determine the
nameservers responsible for the tkk.fi domain. Finally, it queries the tkk.fi for the host or
Web server IP address.
There are two main types of DNS activities: lookups and zone transfers. Lookups happen
when a DNS client, or a DNS server acting on behalf of a client, queries a DNS server for
information. Typically lookups involve finding the IP address for a given hostname, the
hostname for a given IP address, the name server responsible for a given domain, or the
mail server for a given host.
Zone transfers happen when a DNS server requests all records pertaining to a part of the
DNS naming hierarchy (the zone) from another DNS server. The requesting DNS server is
called the secondary server and the serving one is the primary server. Zone transfers are
expected to happen only among servers that should be replicated. Since DNS knows the
details of how a network is structured (the names and IP addresses), this information may
need to be protected.
2.4 Addressing
The Internet is based on hierarchical routing, which is reflected in its addressing system. The
network addresses are divided into two parts, namely the network and host parts. The
former defines the part of the network topology responsible for that address space, and
© 2010 Taylor and Francis Group, LLC
Network Technologies 19
the latter part defines the host. IPv4 has 32-bit addresses and the newer IPv6 extends this
to 128 bits, which is expected to be sufficient for current needs. In both IPv4 and IPv6 the
addressing space is divided into variable size prefixes.
Originally, there were three prefix classes of A, B, and C corresponding to 8, 16, and 24 bits
forthenetworkpartinanaddress.Thelimitationofthismodelwasthateachprefixappeared
withhostaddressesincludedinglobalroutingtables,resultinginscalabilitychallenges.Asa
result of a growth crisis, the classless interdomain routing (CIDR) was designed and deployed.
CIDR supports provider aggregated addresses by allowing variable length network part
in an address. This allows better utilization of the existing address spaces, especially class
B networks and aggregate routing table entries. CIDR has significantly reduced the global
routing tables, and it is used in IPv4 and IPv6 [1].
2.5 Routing
In this section, we briefly outline the basic routing process and then examine interdomain
routing. We briefly present the border gateway protocol (BGP), examine some of the current
challenges for BGP, and finally consider compact routing, which is a family of routing
schemes that aim for scalability.
2.5.1 Overview
Routing in a static network is straightforward, having each router determine directions for
each possible destination. Routing in dynamic networks is more challenging, because the
routing tables change and routing instructions need to be computed at runtime. The key
question is where is the state and how often does it need to be updated?
The common approach is to broadcast routing state to all routers, which is exemplified
in link-state routing protocols that broadcast link-state updates that are used to compute
shortest path distances. To avoid excessive flooding of link-state updates, the common
solution is to divide the network into routing domains and use this hierarchy to limit the
propagation of link-state updates. Areas are extensively used in the OSPF, in which they
are a network-dimensioning instrument. Hierarchies naturally occur in the interdomain
context, in which autonomous systems reflect administrative boundaries.
Aroutingprocessisresponsibleforcomputingtheforwardingtableofanode.Therouting
process estimates the costs of incident links and communicates with its neighbors via these
links. A routing algorithm is the mechanism that defines what information is exchanged
with neighbors and how the forwarding tables are computed. The central purpose of a
routing algorithm is to maintain a forwarding configuration in which nodes are mutually
reachable by forwarding. It is often also desirable for the paths taken by forwarded packets
to be optimal or near-optimal [197].
TheInternetisbasedonhierarchicalrouting.TheseminalworkbyKleinrockandKamoun
published in 1977 showed how hierarchical clustering can be used to produce scalable
routing tables [187]. The key idea is to cluster nearby nodes together and then combine
clusters into superclusters, and continue this in a bottom-up hierarchical manner. As a
result, unnecessary topological information gets abstracted from the routing tables, and
the network scales well. Hierarchical routing results in routing table sizes on the order of
√
n. Hierarchical routing is used today by a variety of protocols in both interdomain (BGP,
CIDR) and intradomain routing (OSPF).
© 2010 Taylor and Francis Group, LLC
20 Overlay Networks: Toward Information Networking
2.5.2 Interdomain
The interdomain structure has resulted from developments in both technology and business
models. It is a mixture of technological advances and business decisions driven by investors
and the stock market. A current trend has been toward massively popular content services
on the Internet. This has created pressure toward better network support of data delivery
and dissemination. The need to be able to deliver vast amounts of data in an efficient
and low-cost manner has given birth to CDNs and various peer-to-peer networks, such as
BitTorrent networks.
CDNs charge for the data delivery service and are typically based on proprietary, closed
solutions. BitTorrent and peer-to-peer networks, however, rely on peer-assisted data ex-
change. The latter rely on low-cost, mostly flat rate, connections between end-users and
their providers. This new network behavior has resulted in various anti-peer-to-peer mea-
suresbyInternetproviderspartlyduetothefactthatmanyP2Pprotocols,suchasBitTorrent,
do not take interdomain policies into account and thus are not ISP friendly.
The core Internet architecture was not designed to serve as critical communication in-
frastructure for society. Therefore, the economical and political context must also be ana-
lyzed and understood. The current question is whether viable economic models exist for
Internet service provision. Business modeling is complicated by regulatory background,
which varies by country. Telephone-carrier-based ISPs have been asking regulators for the
ability to charge differentially, based on the application and content of traffic. This kind
of discriminatory pricing may pose fundamental limitations for end users and limit their
freedom.
Figure2.4illustratesinterdomainroutingwithanumberofautonomoussystems.Overlay
networks are implemented on top of the network layer topology as illustrated in the figure.
Thecurrentinterdomainpracticeisbasedonthreetiers,namelytiers1,2,and3.Tier-1isanIP
network that connects to the entire Internet using settlement-free peering. There are a small
number of tier-1 networks that typically seek to protect their tier-1 status. A tier-2 network
is a network that peers with some networks but relies on tier-1 for some connectivity, for
which it pays settlements. A tier-3 network is a network that only purchases transit from
other networks.
A C
B
A5
B1
A1
A2 A3
A4 B4
B3
C1
C3
C4
C2
B2
AS10
transit
Stub
AS20
Stub
AS30
Stub
AS40
Overlay node
Regular node
FIGURE 2.4
Example of interdomain routing.
© 2010 Taylor and Francis Group, LLC
Network Technologies 21
The three main AS categories are as follows [143]: customer-to-provider (C2P), peer-
to-peer (P2P), and sibling-to-sibling (S2S). In the C2P, a customer AS pays a provider
AS for any traffic sent between the two. In the P2P category, two domains can freely
exchange traffic between themselves and their customers but do not exchange
traffic from or to their providers or other peers. In the S2S category, two domains
are part of the same organization and can freely exchange traffic between their
providers, customers, peers, or other siblings.
Gao’s work formulated the AS relationships inference problem. Gao assumed that every
BGP path must comply with the following hierarchical pattern: an uphill segment of zero or
more C2P or S2S links, followed by zero or one P2P links, followed by a downhill segment of
zero or more P2C or S2S links. Paths with this hierarchical structure are valley-free or valid.
Paths that do not follow this hierarchical structure are called invalid and may result from
BGP misconfiguration or from BGP policies that are more complex and do not distinctly fall
into the C2P/P2P/S2S classification [143]. According to recent measurements, BGP tables
miss up to 86.2% of the true AS adjacencies. The majority of these links are of the P2P type.
This means that peering links are likely to be more dominant than have been previously
reported or conjectured.
2.5.3 Border Gateway Protocol
The border gateway protocol (BGP) is responsible for connecting the different autonomous
systems together, and it is the key protocol for building and maintaining the global routing
table at interdomain routers. The current version of BGP is 4, and it incorporates support
for CIDR and route aggregation to improve scalability (RFC 4271).
BGP is realized as a manually configured overlay network that uses TCP connections
between peers. Routing updates propagate from peer-to-peer, and after receiving updates
a BGP router updates its interdomain routing table based on the new information (the
received path vectors).
BGP keeps a table of IP networks that are reachable either through peering links or transit
links. Each IP address, or prefix, is associated with a vector of AS numbers that indicates
the ASes that need to be traversed to reach the destination prefix. BGP is described as a path
vector protocol, since it is built on this notion of a vector of AS identifiers. Moreover, BGP
does not use intradomain metrics such as latency to make routing decisions; instead it uses
network policies and rule sets to decide what paths are used in routing and forwarding.
2.5.4 Current Challenges
As a central component of the Internet, BGP is at the heart of the network and thus faces
increasing scalability challenges as the global network grows. BGP scalability concerns
stem from the observation that each interdomain router is expected to maintain routing
paths to all valid network prefixes. Currently, there are almost 3 × 105
prefixes [1] in the
global routing table, and this number is expected to grow in the near future because of site
multihoming and provider-independent addressing. In addition to the space requirements,
routing table updates poses several challenges. One is the frequency in which changes are
propagated in the global backbone. Another concern is routing update oscillation that may
result from router misconfiguration.
One way to alleviate BGP scalability concerns is to separate path selection from packet
forwarding. This is exemplified in the NIRA (a new interdomain routing architecture) proposal
that empowers users with the ability to choose a provider and domain level end-to-end
© 2010 Taylor and Francis Group, LLC
22 Overlay Networks: Toward Information Networking
path [354]. The motivation for this is that only users know when a path works or not. This
model creates competition between paths that different ISPs offer, because users can choose
the most suitable paths. In this model, the network comprises three parts for each sender
and receiver—namely, the core region (tier-1), the uphill region that covers all possible
paths from the sender to the core, and the downhill region covering all possible paths from
the core down to the receiver. Each region can have its own routing protocols.
Another recent proposal, the accountable internet protocol (AIP), replaces the subnet prefix
in IP packets with a self-certifying autonomous system identifier and a suffix that is a self-
certifying host identifier [9]. The key idea is to support domain-level routing instead of the
current prefix-based routing. The motivation is that there are fewer autonomous systems
than network prefixes. The proposal also combines domain-level routing with security by
using self-certified identifiers that make it easier to make network entities accountable.
The host identifiers are expected to be unique, which would support host mobility and
multi-homing in a seamless way.
2.5.5 Compact Routing
As mentioned above, BGP faces significant scalability challenges, and recent measurements
indicate that both the size of routing tables and the communication cost are increasing
exponentially [190]. Prefix optimization techniques, such as CIDR, do not appear to be the
most efficient solutions in the long run since they offer only a constant reduction in routing
table sizes and they do not change the scaling behaviour of the network.
Compact routing has been proposed as a candidate solution for decreasing routing table
sizes and improving network scalability. A routing scheme is said to be compact when it
results in logarithmic address and header sizes, sublinear routing table sizes, and a stretch
bounded by a constant. The compact routing schemes can be divided into two categories,
specialized and universal. The former works only on some specific graphs, and the latter
works on all graphs.
It has been shown that the classic link state, distance vector, and path vector routing
algorithms exhibit routing table sizes on the order of n log(n) [144] with stretch-1 (the
worst-case path length versus the shortest path). Moreover, hierarchical routing performs
well only for graphs where large distances between nodes dominate. A universal stretch-1
compact routing algorithm has also (n log(n)) [144]. One interpretation of this result is
that shortest-path routing is incompressible, and to obtain smaller routing tables the stretch
must be allowed to increase above 1. The Cowen and the Thorup-Zwick are two well-known
nonhierarchical stretch-3 compact routing schemes. These name-dependent schemes utilize
a set of landmarks to constrain updates and keep routing table sizes minimal. A routing
table consists of entries for the shortest paths to all landmarks and nodes in the local
cluster [144].
2.6 Multicast
Unicast is the dominant communication model for Internet applications. Multicast is the
process of sending data from typically one sender to multiple receivers. This typically
involves the creation of a multicast tree that is either source specific or shared by the
communicating entities.
In general, the creation of an optimal multicast tree is equivalent to the Steiner tree problem
that is known to be NP complete. This problem bears semblance to the minimum spanning
tree problem; however, it considers only how to reach a specific subset of the nodes [348].
© 2010 Taylor and Francis Group, LLC
Network Technologies 23
The multicast function can be implemented in the network level or it can be implemented
intheapplicationlayer.Network-levelmulticastcomplementsunicastasabasicnetworking
primitive. Application-layer multicast, on the other hand, typically utilizes unicast. In this
section, we first examine IP multicast and then consider overlay multicast techniques.
2.6.1 Network-layer Multicast
Multicast is essentially a one-to-many data delivery mechanism. Network-layer (or IP) mul-
ticast provides the multicast capability in the form of special multicast address ranges that
are used by network routers to connect senders and receivers. Multicast differs significantly
from unicast in that it decouples the senders and receivers. Moreover, since there may be a
number of receivers for a multicast data packet, the network can optimize the transmission
by replicating packets at the last possible moment in the network.
IP multicast is a simple, scalable, and efficient mechanism to realize simple group-
based communication. IP multicast routes IP packets from one sender to multiple
receivers. Participants join and leave the group by sending a packet using the
IGMP (RFC 1112) protocol to a well-known group multicast address.
The key components of IP multicast are
• IP multicast group address
• A multicast distribution tree maintained by routers
• Receiver driven tree creation
In order to receive multicast packets, receivers join a specific IP multicast group. A mul-
ticast distribution tree is constructed and maintained by routers for the group. All packets
sent to the multicast IP address are then delivered by the multicast protocol to all receivers
that have joined the group.
A multicast protocol is responsible for maintaining multicast trees that connect the mem-
bers of multicast groups. There are two main categories of multicast algorithms, namely
source-based trees and shared trees. The former is rooted at the router serving the source
of multicast packets. This means that a tree is needed for each source; however, the trees
can be optimal in terms of some metric. The latter is rooted at a specific router, called a
rendezvous point (RP) or a core, that is responsible for maintaining the tree. In this case, the
source sends data packets to the RP, which then is responsible for disseminating the data
using the tree. The RP can then perform pruning operation to the tree to optimize the traffic.
Internet group management protocol (IGMP) is a protocol designed to allow the manage-
ment of IP multicast groups memberships. IGMP is used by IP hosts and adjacent multi-
cast routers to establish and maintain multicast groups. According to RFC 3171, addresses
224.0.0.0 to 239.255.255.255 are designated as multicast addresses. IGMP is based on UDP
thatisthecommonlow-levelprotocolformulticastaddressing.IPmulticast,asIPingeneral,
is not reliable, and messages may be lost or delivered out of sequence.
There are many different IP multicast protocols. The protocol-independent multicast (PIM)
is a frequently used protocol that supports several different operating modes, namely sparse
mode, dense mode, source-specific mode, and bidirectional mode. Several reliable multicast proto-
cols have been developed—for example, the pragmatic general multicast (PGM) that extends
IP multicast with loss detection and retransmission.
IP multicast groups are not very expressive. They partition the IP datagram address-
space, and each datagram belongs at most to one group. Moreover, IP multicast is a best-
effort unreliable service, and for many applications a reliable transport service is needed.
© 2010 Taylor and Francis Group, LLC
24 Overlay Networks: Toward Information Networking
Multicast works well in closed networks; however, in large public networks multicast or
broadcast may not be practical. In these environments universally adopted standards such
as TCP/IP and HTTP may be better choices for all communication [168].
2.6.2 Application-layer Multicast
Given that IPv4 is still the prevailing network layer protocol and that it does not offer a
native multicast mechanism, it is common to implement multicast on top of the TCP/IP
protocol stack in the form of application-layer (or overlay) multicast. IP multicast requires
routers to maintain per-group state or per-source state for each multicast group. A routing
table entry is needed for each unique multicast group address, and the multicast addresses
are not easily aggregated. Moreover, IP multicast still requires additional reliability and
congestion control solutions.
Therefore, there is motivation for developing and deploying overlay multicast solutions.
Indeed, many of the systems discussed later in this book are examples of these. In this
section, we briefly outline the key motivation for application-layer multicast and the dif-
ferences to network-layer multicast.
An application-layer multicast system typically uses unicast communication be-
tween nodes to realize one-to-many communications. Data packets are replicated
by the end hosts. These protocols may not be as efficient as IP multicast, because
data may be sent multiple times over the same link. As an example, in a previous
version of the Gnutella P2P protocol, one link was observed to be utilized six times
for the same data [273]. This means that nodes establish communications either
using UDP or TCP and forward messages using these links. The multicast tree
construction algorithm is typically distributed and can take various metrics into
account.
Figure 2.5 compares IP multicast and overlay multicast in the following categories: de-
ployment, structure, transport, scalability, congestion control, and efficiency [174]. In terms
of deployment, IP multicast requires multicast-capable routers, whereas overlay multicast
TCP or UDP
UDP
Transport layer protocol
High (depends on solution)
Limited
Scalability
Various, can utilize unicast (TCP) for
node-to-node reliability
No
Congestion control/recovery
Low (varies), can suffer from high
stretch and unoptimal interdomain
routing
High
Efficiency
BitTorrent variants, Scribe,
SplitStream, OverCast, etc.
Protocol-independent multicast (PIM),
Core-based trees (CBT), etc.
Example protocols
Typically a tree, both interior nodes of
the structure and leaves are hosts
Tree, interior nodes are routers,
leaves are hosts
Multicast structure
Deployed over the Internet
Multicast-capable routers
Deployment
Overlay Multicast
IP Multicast
FIGURE 2.5
Comparison of IP and overlay multicast.
© 2010 Taylor and Francis Group, LLC
Network Technologies 25
is based on hosts and can thus be deployed easily over the Internet. Both approaches are
based on trees, with the difference being that in IP multicast hosts do not participate in
the tree other than as leaves. As mentioned, IP multicast is not widely deployed and hence
its scalability is limited. It is, however, efficient, whereas overlay solutions may not utilize
optimal paths and may incur more overhead.
2.6.3 Chaining TCP Connections for Multicast
Intuition suggests that overlay multicast typically incurs a performance penalty over IP
multicast because of factors such as link stress, stretch factor, and end host packet process-
ing. For example, early versions of the Gnutella P2P protocol used TCP, but later versions
replaced it with UDP for performance reasons. Chains of TCP connections can offer an
opportunity to increase performance compared to direct unicast. This performance im-
provement comes from finding an alternative overlay path whose narrowest hop in the
chain (as perceived by TCP) is wider than the default path used by IP [192].
The expected TCP throughput as a function of the per-hop loss rates and RTTs can be
modeled using the following equation derived in [247]:
T =
s
rtt

2p
3
+

12

3p
8

p(1 + 32p2)
 ≈
√
1.5
rtt
√
p
(2.1)
This provides an estimate of the expected throughput T of a TCP connection in bytes/sec
as a function of the packet size s, the measured round-trip time rtt, and the steady state
loss event rate p.
A given hop in a chain of TCP connections either has local network conditions that limit
its rate to a value below that of the upstream connections or is already limited by the rate of
the upstream connections. Following the methodology used in [361], the aggregate RTT is
defined as the sum ofrtti along the path and the aggregate loss rate is defined as 1−

1 − pi
(assuming uncorrelated losses).
T ≈
√
1.5

rtti
√
1 − (1 − pi )
(2.2)
2.7 Network Coordinates
The latency of network communications is an important metric for choosing routes and
peers on the network. This raises the question of how accurately latency can be predicted
without prior communication. Recent network measurement systems indicate that latency
prediction is feasible based on synthetic network coordinates [91, 101, 320, 349]. A network
coordinate system might be used to select from among a number of replicated servers to
request a file.
Vivaldi is a distributed algorithm that assigns synthetic coordinates to Internet
hosts. It uses the Euclidean distance between the coordinates of two hosts to pre-
dict the network latency between them. In this system, each node computes its
coordinates by simulating its position in a network of physical springs. The sys-
tem does not require fixed infrastructure, and a new host can compute useful
coordinates after obtaining latency information from some other hosts [101].
© 2010 Taylor and Francis Group, LLC
26 Overlay Networks: Toward Information Networking
2.7.1 Vivaldi Centralized Algorithm
When formulated as a centralized algorithm, the input to Vivaldi is a matrix of real network
latencies M, such that Mxy is the latency between x and y. The output is a set of coordinates.
Finding the best coordinates is equivalent to minimizing the error (E) between predicted
distances and the supplied distances. Vivaldi uses a simple squared error function:
E =

x

y
(Mxy − dist(x, y))2
, (2.3)
where dist(x, y) is the standard Euclidean distance between coordinates of x and y.
Vivaldi places a spring between each pair of nodes for which it knows the network latency,
with the rest length set to that latency. The length of each spring is the distance between the
current coordinates of the two nodes. The potential energy of a spring is proportional to the
displacement from its rest length squared: this displacement is identical to the prediction
error of the coordinates. Therefore, minimizing the potential energy of the spring system
corresponds to minimizing the prediction error E.
Vivaldi simulates the physical spring system by running the system through a series of
small time steps. At each time step, the force on each node is calculated and the node moves
in the direction of that force. The node moves a distance proportional to the applied force
and the size of the time step.
Each time a node moves it decreases the energy of the system; however, the energy of
the system stored in the springs will typically never reach zero, since network latencies do
not reflect a Euclidean space. Neither the spring relaxation nor some of the other solutions,
such as the simplex algorithm, is guaranteed to find the global minimal solution. Simu-
lating spring relaxation requires much less computation than more general optimization
algorithms.
2.7.2 Vivaldi Distributed Algorithm
In the distributed version of Vivaldi, each node simulates a piece of the overall spring
system. A node maintains an estimate of its own current coordinates, starting at the origin.
Whenever two nodes communicate, the two nodes measure the latency between them and
exchange their current synthetic coordinates. In RPC-based systems, this measurement can
be accomplished by timing the RPC; in a stream-oriented system, the receiver might echo
a timestamp.
Once a measurement is obtained, both nodes adjust their coordinates to reduce the mis-
match between the measured latency and the coordinate distance. A node moves its coor-
dinates toward a point p along the line between it and the other node. The point p is chosen
to be the point that reduces the difference between the predicted and measured latency
between the two nodes to zero. To avoid oscillation, a node moves its coordinates only a
fraction δ toward p.
A node initializes δ to 1.0 when it starts and reduces it each time it updates its coordinates.
Vivaldi starts with a large δ to allow a node to move quickly toward good coordinates and
ends up with a small δ to avoid oscillation. If two nodes have the same coordinates (the
origin, for instance), they each choose a random direction in which to move. Algorithm 2.1
illustrates the update procedure.
2.7.3 Applications
A modified chord DHT (presented in Chapter 5) uses network coordinates to efficiently
build routing tables based on proximity so that lookups are likely to proceed to nearby
© 2010 Taylor and Francis Group, LLC
Network Technologies 27
Algorithm 2.1 Pseudocode for the Vivaldi update procedure
Data: sc is the other host’s coordinates, sl is the one-way latency to that host, the initial
value of δ is 1.0.
Function: update(sc, sl)
/* Unit vector toward other host */
Vector dir = sc − myc
dir = dir / length(dir)
/* Distance from springs rest position */
d = dist(sc, myc) − sl
/* Displacement from rest position */
Vector x = dir ∗ d
/* Reduce δ at each sample */
δ− = 0.025
/* Stop at 0.05 */
δ = max(0.05, δ)
x = x ∗ δ
/* Apply the force */
myc = myc + x
nodes. A node receives a list of candidate nodes and selects the one that is closest in
coordinate space as its routing table entry; coordinates allow the node to make this decision
without probing each candidate.
The modified chord utilizes coordinates when performing an iterative lookup. When
a node n1 initiates a lookup and routes it through some node n2, n2 chooses a next hop
that is close to n1 based on Vivaldi coordinates. In an iterative lookup, n1 sends an RPC to
each intermediate node in the route, so proximity to n1 is more important than proximity
to n2.
2.7.4 Triangle Inequality Violation
For a network coordinate system to work, it needs to properly reflect the latencies between
network hosts. When neighbour or peer selection is based on brute-force network measure-
ments, the quality of the selection cannot be affected by triangle inequality violations (TIV);
however, when the number of nodes grows, performing these brute-force measurements
may not be feasible. Then it is preferable to use a delay measurement system such as net-
work coordinates discussed above. The potential challenge in using these systems is the
assumption on the delay space that the triangle equality holds [340].
Any three nodes on the Internet A, B, and C form a triangle ABC. Edge AC is
considered to cause a triangle inequality violation if d(A, B) + d(B, C)  d(A, C),
where d(X, Y) is the measured delay between X and Y. The triangulation ratio of
the violation caused by AC in triangle ABC is defined as d(A, C) = (d(A, B) +
d(B, C)).
It has been demonstrated that TIVs can cause significant errors in latency estimation
based on network coordinate systems. As a potential remedy, a TIV alert mechanism has
been proposed that identifies edges with severe TIVs [340].
© 2010 Taylor and Francis Group, LLC
28 Overlay Networks: Toward Information Networking
2.8 Network Metrics
In this section, we examine metrics that characterize various properties of networks. Our
focus is, in particular, on metrics that are useful in the design and deployment of overlay
networks. We have already touched this issue when discussing routing. First, we briefly
consider routing algorithm invariants, which are crucial for ensuring that the algorithms
perform according to the specifications. These invariants and properties do not assess how
well the paths perform that a routing algorithm maintains in a routing table. Therefore a
number of metrics are needed to understand the quality of the paths, the state of the routers
and nodes, and the properties of the network. We elaborate on the following metrics: shortest
path, routing table size, path stretch, forwarding load, churn, and several other metrics.
2.8.1 Routing Algorithm Invariants
The correctness and performance of a routing algorithm can be analyzed using a number of
metrics. Typically it is expected that a routing algorithm satisfies certain invariant properties
that must be satisfied at all times. The two key properties are safety and liveness. The former
states that undesired effects do not occur; in other words, the algorithm works correctly. The
latterstatesthatthealgorithmcontinuestoworkcorrectly—forexample,itavoidsdeadlocks
and loops. These properties can typically be proven for a given routing algorithm under
certain assumptions.
Safety and liveness can also be specified in terms of soundness and completeness [197] for
a routing configuration. A configuration is sound if it includes paths for all node pairs that
are reachable (have a path) after the network becomes quiet. A degenerate form of this
configuration is one in which all nodes are unreachable. Completeness is used to ensure
that all paths in the network are included in the configuration. Together these properties
say that all nodes are reachable through the routing and forwarding system; however, they
do not determine how optimal the paths are. Therefore, additional metrics are needed to
assess the quality of the paths.
2.8.2 Convergence
Soundness and completeness (or safety and liveness) do not consider how quickly the
routing algorithm works or converges when the network changes. They only ensure that
from the viewpoint of the system invariants, the operation is correct. Indeed, convergence
costandtimeisanimportantmetricfordifferentkindsofroutingsystems,includingoverlay
algorithms.
The dynamics of peers joining and leaving an overlay system is called churn, and it is an
inherent property of P2P systems. Peer participation is highly dynamic. Typically, a large
part of the active peers are stable and the remaining peers change quickly [312]. This means
that P2P overlay networks must be designed in such a way that they tolerate high levels of
churn. Indeed, many of the algorithms presented in Chapter 6 tolerate churn.
2.8.3 Shortest Path
The goal of a routing algorithm is to find the shortest paths between two destinations,
A and B, that are reachable through the network. In order to do this, we need to have a
metric for calculating these shortest paths and then create routing tables that reflect the
paths according to distance. OSPF is an example of an intradomain protocol that com-
putes shortest paths using link state routing. On the other hand, BGP is an example of a
© 2010 Taylor and Francis Group, LLC
Network Technologies 29
policy-based routing that calculates shortest paths based on policies and AS hops instead
of, say, delay.
In general, the shortest path length between two nodes Aand B is the minimum number
of edges needed to traverse to reach A from B. The average path length is the average of
the shortest path lengths between any two nodes. The average path length is a metric of
the number of hops to reach a node from a given source.
2.8.4 Routing Table Size and Stretch
We can observe two conflicting goals in the design of routing algorithms, namely that the
network paths used by a given router should be as short as possible and, at the same time,
the routing table should be as small as possible. The two key metrics are the optimality of
the paths and the size of the routing tables.
The efficiency of a routing algorithm is measured in terms of its stretch factor—
that is, the maximum ratio between the length (or delay) of a route computed by
the algorithm and that of a shortest path (or delay) connecting the same pair of
nodes [251].
Stretch signifies the degree of achieved performance in terms of the optimal choice. For
overlay systems, there is an inherent overhead compared to IP routing with the benefit of
deployability and scalability. The treatment for overlay multicast is a bit more challenging.
Typically, the benchmark IP multicast tree would be assumed to consist of the optimal
unicast paths.
We can extend the notion of a stretch to a multicast overlay tree as follows.
Stretch for a multicast overlay tree is the ratio of the number of network-layer hops
(or delay) in the path from the sender to a receiver in the multicast overlay tree,
and the number of hops (or delay) required by the shortest unicast path between
these two nodes, averaged over all trees and paths.
In addition to stretch, we have the routing table size as the other important metric. The
routing table should hold only a fraction of nodes in the network, and the routing algorithm
should not require global information about the nodes. For overlay networks, the aim is
to support routing tables that have sublinear sizes to the number of nodes in the network
(and the number of items in the network). The routing table data structure should also be
efficiently realized using hardware and software.
These two metrics are in conflict, and a routing algorithm needs to balance between the
size of the routing table and the optimality of the paths.
2.8.5 Forwarding Load
Another important metric is the forwarding load placed on routers in terms of packets,
connections, and messages. For an IP router, forwarding load is measured in terms of
incoming packets and the incurring per-packet delay. IP routers use hardware or software
routing tables to look up destination interfaces for a packet given the packet’s destination
prefix. If a router cannot handle all incoming packets, its queues will become full and it
will start to drop packets. This congestion is then handled at the edge of the network,
following the end-to-end principle, and congestion avoidance is implemented in transport
layer protocols, exemplified by the TCP congestion control algorithm.
© 2010 Taylor and Francis Group, LLC
30 Overlay Networks: Toward Information Networking
Router forwarding load therefore is handled mostly at the edge for TCP/IP; however,
overlay nodes are typically end hosts themselves, which makes stress an issue that has to
be taken into consideration when designing an overlay algorithm. For an overlay node, for-
warding load can be viewed to be the amount of traffic the node is processing at a particular
time or time interval. This traffic has many components, namely control traffic pertaining to
how the overlay network is structured (neighbors, super nodes, etc.) and the actual content.
In a multicast system, forwarding load can be expressed in terms of the branching fac-
tor (or replication factor) of each node. For overlay multicast systems, the load incurred
from multicast forwarding compared to network level forwarding can be defined to be the
number of identical packets sent by a node. For network layer multicast there is no redun-
dant packet replication; however, an overlay multicast scheme may result in a number of
unnecessary packet replications (called false positives).
2.8.6 Churn
Churn is a metric that is especially pertinent for P2P overlay systems. Churn pertains to the
rate of arrivals and departures in the system. Typically, a large part of the active peers are
stable and the remaining peers come and go. P2P overlay networks must be designed in
such a way that they tolerate high levels of churn. Recent analysis of churn indicates that,
overall, its characteristics are remarkably similar across different systems [271, 312].
Churn is an inherent property of P2P systems and describes the dynamics of peer
arrival and departure. High churn means that the system is highly dynamic, with
peers coming and going.
Two metrics have been commonly used for churn in file-sharing systems, namely a node’s
session time and lifetime. The session time is the duration between the node joining the
network and then subsequently leaving it. The lifetime is the time between when the node
first entered the network and then left the network permanently. These two metrics are
depicted by Figure 2.6. The availability of a node can be defined to be the sum of a node’s
session times divided by its lifetime. In one study, it has been argued that the session times
of nodes in a DHT are more relevant than their lifetimes [271].
2.8.7 Other Metrics
Other important metrics that characterize a network include:
• Network diameter, which is the average minimum distance between any pair of
nodes.
• Node degree, which is the number of links that the node has to other nodes in an
undirected graph. This degree distribution is connected with the robustness of the
network to node failures.
Time
Join
Lifetime
Leave Join Leave
FIGURE 2.6
Session time in Churn.
© 2010 Taylor and Francis Group, LLC
Network Technologies 31
• Locality-awareness and the properties of data, which are important for data lookup
overlays and CDNs.
• Policy compliancy, which is important for routing that takes place across organi-
zation boundaries. BGP is the classic example of a policy-based routing protocol.
© 2010 Taylor and Francis Group, LLC
© 2010 Taylor and Francis Group, LLC
3
Properties of Networks and Data
This chapter examines the salient properties of networks and data communicated over
the networks. We start with a characterization of data on the current Internet and discuss
the growth rate of the global network. Both geographical and logical distribution of data
are crucial when creating overlay networks over the Internet that ensure efficient data
availability. We discuss the role of power-laws and small-worlds in networking.
In order to engineer efficient overlay systems, a lot of information is needed pertaining
to the underlying network, the nodes and their characteristics, and the properties of the
data that they subscribe, publish, and seek. This calls for various models, including mod-
els of the actual traffic distributions on the Internet (including their spatial and temporal
characteristics), models of host connectivity, models of the dynamics of churn, and so on.
In this chapter, we outline some of the fundamental characteristics of overlay networks.
3.1 Data on the Internet
We are currently in the era of the exabyte in terms of annual IP traffic [78] and
entering the era of the zettabyte (1021
bytes). Cisco’s latest traffic forecast for 2009–
2013 indicates that annual global IP traffic will reach 667 exabytes in 2013 [79]. The
traffic is expected to increase some 40% each annum. Much of this increase comes
from the delivery of video data in various forms.
Figure 3.1 presents Cisco’s forecast estimates for monthly global IP traffic until 2011. Ac-
cording to these estimates, the Internet is growing fast. We can compare this estimate with
the situation in 2005 when the global traffic was a bit over 2000 petabytes per month. The
forecast predicts approximately eightfold increase in monthly traffic volume.
Figure 3.2 compares monthly traffic estimates for a number of content providers. The
growth of data-intensive services is evident in the amount of traffic transmitted per month.
We observe that Google and YouTube have by far the greatest bandwidth requirements.
The estimates for US traffic for these two services in mid-2007 far surpassed the US Internet
backbone at year end in 1998; in fact the traffic was over seven times larger. This gives an
idea of the radical growth of the Internet in the last 10 years.
3.1.1 Video Delivery
Video delivery on the Internet is anticipated to see a huge increase, and the volume of video
delivery is expected to be 700 times the capacity of the US Internet backbone in 2000. Cisco’s
study anticipates that video traffic will account for 91% of all consumer traffic in 2013.
33
© 2010 Taylor and Francis Group, LLC
34 Overlay Networks: Toward Information Networking
Cisco’s Global IP Traffic Forecast 2005–2011
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
Petabytes/Month
2005 2006 2007 2008 2009 2010 2011
FIGURE 3.1
Cisco’s global IP traffic forecast estimates 2005–2011.
The increasing video-related traffic creates a number of challenges for network engi-
neering. Video files are typically large, and with the advent of high-definition content they
will be even larger. This means that even a small adoption of a video delivery technology
can result in significant shifts in traffic patterns. This unpredictability of traffic patterns
makes network provisioning more difficult and may result in decreased quality of service
for customers [145].
Flash crowds contribute to the unpredictability of the network. A flash crowd happens
when a certain video or Web site becomes, typically unexpectedly, massively popular [119].
Flash crowds can be alleviated by using content replication and caching schemes. Another
frequently used technique by Web sites is to detect unexpected demand for content and
simplify the Web content to make it smaller.
Video delivery poses new challenges and opportunities for Internet service providers.
Video consumes bandwidth, and with the emergence of flat rates consumers do not pay
per megabyte. Moreover, the content may come from anywhere on the Internet, which may
result in increased interdomain traffic charges for the ISP. This means that service revenue
is no longer related to the connectivity revenue.
196
BBC (UK) April 2007
353
Yahoo (UK) April 2007
1129
Time Warner (US) May 2007
1854
ABC, NBC, ESPN, Disney (US) May 2007
2361
Yahoo (US) May 2007
3500
iTunes audio and video downloads (2006)
3709
Google (UK) April 2007
4148
MySpace (US) May 2007
6000
US Internet backbone at year end 1998
10956
Google and YouTube (US) May 2007
45750
Google and YouTube (worldwide mid-2007 Cisco estimate)
Terabytes per month
FIGURE 3.2
Monthly traffic estimates for content services.
© 2010 Taylor and Francis Group, LLC
Properties of Networks and Data 35
3.1.2 P2P Traffic
According to the study, peer-to-peer traffic will continue to grow but become a smaller
component of Internet traffic in terms of its current share. The current P2P systems in 2009
are transferring 3.3 exabytes of data per month. The recent study indicates that the P2P
share of consumer Internet traffic will drop to 20% by 2013, down from the current 50% (at
the end of 2008). Even though the P2P share may diminish, most video delivery solutions,
accounting for much of the traffic increase, will utilize overlay technologies, which makes
this area crucial for ensuring efficient and scalable services.
3.1.3 Trends in Networking
Figure 3.3 presents a number of significant trends in IP networking and outlines their
challenges and potential solutions. Current trends include P2P, Internet broadcast, both
Internet and commercial video-on-demand (VoD), and high-definition content.
P2P presents a number of challenges for IP networks because it increases traffic and
utilizes upstream for data exchange. This changes the customary usage of the network in
which downstream dominates the traffic model. Therefore, IP networks need to be provi-
sioned in such a way that possible upstream bottlenecks are eliminated. Caching can be
seen as a potential solution to P2P traffic. Indeed, many current P2P protocols are able
to take network proximity into account so that data can be obtained from a nearby P2P
node.
Internet broadcast pertains to the dissemination of large media files or streams. Flash
crowds are challenging because they make it difficult to provision the network in such a
way that it can handle the expected demand for the content. This can be alleviated by using
P2P content distribution technologies and multicast technologies. Since there is no global IP
multicast support available, network layer multicast needs to be used in specific networks,
such as metropolitan area networks or wireless access networks.
Internet VoD is becoming increasingly popular, and thus the growth of the traffic is a
challenge for the network. This mostly affects the metropolitan area networks and the
core networks. The solutions include CDNs and increasing the network capacity. Data
compression can also be used to reduce the size of the media files. VoD can be cached,
which makes it easy to cache. Commercial VoD is typically delivered in the metropolitan
P2P caching
Growth in traffic, upstream
bottlenecks
P2P
P2P content distribution,
multicast technologies
Flash crowds
Internet Broadcast
CDNs, increasing network
capacity, compression
Access network IPTV
bottleneck, growth in VoD traffic
volume in the metropolitan area
network
High-definition content
CDNs, increasing network
capacity, compression
Growth in traffic in the
metropolitan area network
Commercial Video-on-Demand
Content Delivery Networks
(CDNs), increasing network
capacity, compression
Growth in traffic, especially
metropolitan area and core
Internet Video-on-Demand
Solutions
Challenges
Trend
FIGURE 3.3
Trends, challenges, and potential solutions for IP traffic.
© 2010 Taylor and Francis Group, LLC
36 Overlay Networks: Toward Information Networking
area network, which needs to be provisioned accordingly. The core network is not burdened
much by commercial VoD, because the content can be replicated to relevant MAN networks.
High-definition content also poses challenges, because due to higher quality the amount
of data that needs to be transferred grows radically. Access networks are constrained by
their IP television (IPTV) solution. CDNs and increasing the network capacity as well as
compression are potential remedies.
3.2 Zipf’s Law
A power-law implies that small occurrences are extremely common, whereas large
instances are extremely rare. This regularity or law is also referred to as Zipf or
Pareto. Zipf’s law is interesting for networked systems, because it has been shown
that many different activities follow this law—for example, query distributions
and Web site popularity. The linguist George Zipf first proposed the law in 1935
in the context of word frequencies in languages. For Web sites, the Zipf law means
that large sites get disproportionately more traffic than smaller sites.
In this section, we give an overview of the Zipf distribution and two related distributions,
namely Pareto and power-law distributions. Then we briefly discuss the implications for
the Internet and P2P.
3.2.1 Overview
The Zipf distribution is concerned with the ranking of objects based on their popularity.
The ranking is done by assigning the most popular object the rank of one, the second most
popular object a rank of two, and so on. Zipf’s law states that if objects are ranked according
to the frequency of occurrence, the frequency of occurrence F is related to the rank of the
object R according to the relation
F ∼ R−β
, (3.1)
where the constant is close to one.
The simplest verification of the applicability of Zipf’s law is to plot the rank-ordered
list of objects versus the frequency of the object on a log-log scale. On a log-log scale,
the observance of a straight line is indicative of the applicability of Zipf’s law. The Zipf
distribution and power-law distributions are directly related, and they are different ways
of looking at the same phenomena [5]. Zipf is used to model the rank distributions and
power-law for frequency distributions.
The Zipf distribution is related to the Pareto distribution. Pareto was interested in the
distribution of income, with the question of how many people have an income greater than
x. Pareto’s law is defined in terms of the cumulative distribution function (CDF). The Pareto
distribution gives the probability that a person’s income is greater than or equal to x:
P[X  x] ∼ x−k
. (3.2)
A power-law distribution in its typical usage tells the number of people whose income
is exactly x rather than how many people had an income greater than x. It is the probability
distribution function (PDF) associated with the CDF given by Pareto’s law
P[X = x] ∼ x−(k+1)
m (3.3)
where k is the Pareto distribution shape parameter.
© 2010 Taylor and Francis Group, LLC
Other documents randomly have
different content
hereinafter mentioned, in so far as concerns them, on condition that the
special stipulations contained in this article are fulfilled by Germany.
Postal Conventions:
Conventions and agreements of the Universal Postal Union
concluded at Vienna, July 4, 1891.
Conventions and agreements of the Postal Union signed at
Washington, June 15, 1897.
Conventions and agreements of the Postal Union signed at Rome
May 26, 1906.
Telegraphic Conventions:
International Telegraphic Conventions signed at St. Petersburg July
10, (22,) 1875.
Regulations and tariffs drawn up by the International Telegraphic
Conference, Lisbon, June 11, 1908.
Germany undertakes not to refuse her assent to the conclusion by the
new States of the special arrangements referred to in the conventions
and agreements relating to the Universal Postal Union and to the
International Telegraphic Union, to which the said new States have
adhered or may adhere.
ARTICLE 284.—From the coming into force of the present treaty the
high contracting parties shall apply, in so far as concerns them, the
International Radio-Telegraphic Convention of July 5, 1912, on
condition that Germany fulfills the provisional regulations which will
be indicated to her by the Allied and Associated Powers.
If within five years after the coming into force of the present treaty a
new convention regulating international radio-telegraphic
communications should have been concluded to take the place of the
convention of July 5, 1912, this new convention shall bind Germany
even if Germany should refuse either to take part in drawing up the
convention or to subscribe thereto.
This new convention will likewise replace the provisional regulations
in force.
ARTICLE 285.—From the coming into force of the present treaty the
high contracting parties shall apply in so far as concerns them and
under the conditions stipulated in Article 272 the conventions
hereinafter mentioned:
1. The conventions of May 6, 1882, and Feb. 1, 1889, regulating the
fisheries in the North Sea outside territorial waters.
2. The conventions and protocols of Nov. 16, 1887, Feb. 14, 1893, and
April 11, 1894, regarding the North Sea liquor traffic.
ARTICLE 286.—The International Convention of Paris of March 20,
1883, for the protection of industrial property, revised at Washington on
June 2, 1911; the International Convention of Berne of Sept. 9, 1886, for
the protection of literary and artistic works, revised at Berlin on Nov.
13, 1908, and completed by the additional protocol signed at Berne on
March 20, 1914, will again come into effect as from the coming into
force of the present treaty, in so far as they are not affected or modified
by the exceptions and restrictions resulting therefrom.
ARTICLE 287.—From the coming into force of the present treaty the
high contracting parties shall apply, in so far as concerns them, the
Convention of the Hague of July 17, 1905, relating to civil procedure.
This renewal, however, will not apply to France, Portugal and
Rumania.
ARTICLE 288.—The special rights and privileges granted to
Germany by Article 3 of the convention of Dec. 2, 1899, relating to
Samoa shall be considered to have terminated on Aug. 4, 1914.
ARTICLE 289.—Each of the Allied or Associated Powers, being
guided by the general principles or special provisions of the present
treaty, shall notify to Germany the bilateral treaties or conventions
which such Allied or Associated Power wishes to revive with Germany.
The notification referred to in the present article shall be made either
directly or through the intermediary of another power. Receipt thereof
shall be acknowledged in writing by Germany. The date of the revival
shall be that of the notification.
The Allied and Associated Powers undertake among themselves not
to revive with Germany any conventions or treaties which are not in
accordance with the terms of the present treaty.
The notification shall mention any provisions of the said conventions
and treaties which, not being in accordance with the terms of the
present treaty, shall not be considered as revived. In case of any
difference of opinion, the League of Nations will be called on to decide.
A period of six months from the coming into force of the present
treaty is allowed to the Allied and Associated Powers within which to
make the notification.
Only those bilateral treaties and conventions which have been the
subject of such a notification shall be revived between the Allied and
Associated Powers and Germany; all the others are and shall remain
abrogated.
The above regulations apply to all bilateral treaties or conventions
existing between all the Allied and Associated Powers signatories to the
present treaty and Germany, even if the said Allied and Associated
Powers have not been in a state of war with Germany.
ARTICLE 290.—Germany recognizes that all the treaties,
conventions, or agreements which she has concluded with Austria,
Hungary, Bulgaria, or Turkey since Aug. 1, 1914, until the coming into
force of the present treaty are and remain abrogated by the present
treaty.
ARTICLE 291.—Germany undertakes to secure to the Allied and
Associated Powers, and to the officials and nationals of the said powers,
the enjoyment of all the rights and advantages of any kind which she
may have granted to Austria, Hungary, Bulgaria, or Turkey, or to the
officials and nationals of these States by treaties, conventions, or
arrangements concluded before Aug. 1, 1914, so long as those treaties,
conventions, or arrangements remain in force.
The Allied and Associated Powers reserve the right to accept or not
the enjoyment of these rights and advantages.
ARTICLE 292.—Germany recognizes that all treaties, conventions, or
arrangements which she concluded with Russia or with any State or
Government of which the territory previously formed a part of Russia,
or with Rumania before Aug. 1, 1914, or after that date until the coming
into force of the present treaty, are and remain abrogated.
ARTICLE 293.—Should an Allied or Associated Power, Russia, or a
State or Government of which the territory formerly constituted a part
of Russia have been forced since Aug. 1, 1914, by reason of military
occupation or by any other means or for any other cause, to grant or to
allow to be granted by the act of any public authority, concessions,
privileges, and favors of any kind to Germany or to a German nation,
such concessions, privileges, and favors are ipso facto annulled by the
present treaty.
No claims or indemnities which may result from this annulment shall
be charged against the Allied or Associated Powers or the powers,
States, Governments, or public authorities which are released from their
engagements by the present article.
ARTICLE 294.—From the coming into force of the present treaty
Germany undertakes to give the Allied and Associated Powers and
their nationals the benefit ipso facto of the rights and advantages of any
kind which she has granted by treaties, conventions or arrangements to
non-belligerent States or their nationals since Aug. 1, 1914, until the
coming into force of the present treaty so long as those treaties,
conventions, or arrangements remain in force.
ARTICLE 295.—Those of the high contracting parties who have not
yet signed, or who have signed but not yet ratified, the Opium
Convention signed at The Hague on Jan. 23, 1912, agree to bring the
said convention into force, and for this purpose to enact the necessary
legislation without delay and in any case within a period of twelve
months from the coming into force of the present treaty.
Furthermore, they agree that ratification of the present treaty should
in the case of powers which have not yet ratified the Opium
Convention be deemed in all respects equivalent to the ratification of
that convention and to the signature of the special protocol which was
opened at The Hague in accordance with the resolutions adopted by the
Third Opium Conference in 1914 for bringing the said convention into
force.
For this purpose the Government of the French Republic will
communicate to the Government of the Netherlands a certified copy of
the protocol of the deposit of ratifications of the present treaty, and will
invite the Government of the Netherlands to accept and deposit the
said certified copy as if it were a deposit of ratifications of the Opium
Convention and a signature of the additional protocol of 1914.
SECTION III.—Debts.
ARTICLE 296.—There shall be settled through the intervention of
clearing offices to be established by each of the high contracting parties
within three months of the notification referred to in paragraph (e)
hereafter the following classes of pecuniary obligations:
1. Debts payable before the war and due by a national of one of the
contracting powers, residing within its territory, to a national of an
opposing power, residing within its territory.
2. Debts which became payable during the war to nationals of one
contracting power residing within its territory and arose out of
transactions or contracts with the nationals of an opposing power,
resident within its territory, of which the total or partial execution
was suspended on account of the declaration of war.
3. Interest which has accrued due before and during the war to a
national of one of the contracting powers in respect of securities
issued by an opposing power, provided that the payment of
interest on such securities to the nationals of that power or to
neutrals has not been suspended during the war.
4. Capital sums which have become payable before and during the
war to nationals of one of the contracting powers in respect of
securities issued by one of the opposing powers, provided that the
payment of such capital sums to nationals of that power or to
neutrals has not been suspended during the war.
Copyright Harris  Ewing
M. Stephen Pichon
Chosen Chairman of the provisional
organization of the League of Nations in
recognition of his long leadership, not only
in France but internationally, in the work of
bringing about a world-wide organization to
preserve peace.
Click for a larger image.
The proceeds of liquidation of enemy property, rights, and
interests mentioned in Section IV. and in the annex thereto will be
accounted for through the clearing offices, in the currency and at the
rate of exchange hereinafter provided in Paragraph (d), and
disposed of by them under the conditions provided by the said
section and annex.
The settlements provided for in this article shall be effected
according to the following principles and in accordance with the
annex to this section:
a. Each of the high contracting parties shall prohibit, as from the
coming into force of the present treaty, both the payment and
the acceptance of payment of such debts, and also all
communications between the interested parties with regard to
the settlement of the said debts otherwise than through the
clearing offices.
b. Each of the high contracting parties shall be respectively
responsible for the payment of such debts due by its nationals,
except in the cases where before the war the debtor was in a
state of bankruptcy or failure, or had given formal indication of
insolvency or where the debt was due by a company whose
business has been liquidated under emergency legislation
during the war. Nevertheless, debts due by the inhabitants of
territory invaded or occupied by the enemy before the armistice
will not be guaranteed by the States of which those territories
form part.
c. The sums due to the nationals of one of the high contracting
parties by the nationals of an opposing State will be debited to
the clearing office of the country of the debtor, and paid to the
creditor by the clearing office of the country of the creditor.
d. Debts shall be paid or credited in the currency of such one of the
Allied and Associated Powers, their colonies or protectorates, or
the British Dominions or India, as may be concerned. If the
debts are payable in some other currency they shall be paid or
credited in the currency of the country concerned, whether an
Allied or Associated Power, colony, protectorate, British
Dominion, or India, at the pre-war rate of exchange.
For the purpose of this provision the pre-war rate of exchange
shall be defined as the average cable transfer rate prevailing in
the Allied or Associated country concerned during the month
immediately preceding the outbreak of war between the said
country concerned and Germany.
If a contract provides for a fixed rate of exchange governing the
conversion of the currency in which the debt is stated into the
currency of the Allied or Associated country concerned, then the
above provisions concerning the rate of exchange shall not
apply.
In the case of new States the currency in which and the rate of
exchange at which debts shall be paid or credited shall be
determined by the Reparation Commission provided for in Part
VIII. (Reparation.)
e. The provisions of this article and of the annex thereto shall not
apply as between Germany on the one hand and any one of the
Allied and Associated Powers, their colonies or protectorates, or
any one of the British Dominions or India on the other hand,
unless within a period of one month from the deposit of the
ratifications of the present treaty by the power in question, or of
the ratification on behalf of such dominion or of India, notice to
that effect is given to Germany by the Government of such
Allied or Associated Power or of such Dominion or of India as
the case may be.
f. The Allied and Associated Powers who have adopted this article
and the annex hereto may agree between themselves to apply
them to their respective nationals established in their territory
so far as regards matters between their nationals and German
nationals. In this case the payments made by application of this
provision will be subject to arrangements between the allied and
associated clearing offices concerned.
ANNEX
1. Each of the high contracting parties will, within three months
from the notification provided for in Article 296, Paragraph (e),
establish a clearing office for the collection and payment of
enemy debts.
Local clearing offices may be established for any particular
portion of the territories of the high contracting parties. Such
local clearing offices may perform all the functions of a central
clearing office in their respective districts, except that all
transactions with the clearing office in the opposing State must
be effected through the central clearing office.
2. In this annex the pecuniary obligations referred to in the first
paragraph of Article 296 are described as enemy debts, the
persons from whom the same are due as enemy debtors, the
persons to whom they are due as enemy creditors, the clearing
office in the country of the creditor is called the Creditor
Clearing Office, and the clearing office in the country of the
debtor is called the Debtor Clearing Office.
3. The high contracting parties will subject contraventions of
Paragraph (a) of Article 296 to the same penalties as are at
present provided by their legislation for trading with the
enemy. They will similarly prohibit within their territory all
legal process relating to payment of enemy debts, except in
accordance with the provisions of this annex.
4. The Government guarantee specified in Paragraph (b) of Article
296 shall take effect whenever, for any reason, a debt shall not
be recoverable, except in a case where at the date of the
outbreak of war the debt was barred by the laws of prescription
in force in the country of the debtor, or where the debtor was at
that time in a state of bankruptcy or failure or had given formal
indication of insolvency, or where the debt was due by a
company whose business has been liquidated under emergency
legislation during the war. In such case the procedure specified
by this annex shall apply to payment of the dividends.
The terms bankruptcy and failure refer to the application of
legislation providing for such juridical conditions. The
expression formal indication of insolvency bears the same
meaning as it has in English law.
5. Creditors shall give notice to the Creditor Clearing Office within
six months of its establishment of debts due to them, and shall
furnish the Clearing Office with any documents and
information required of them.
The high contracting parties will take all suitable measures to
trace and punish collusion between enemy creditors and
debtors. The clearing offices will communicate to one another
any evidence and information which might help the discovery
and punishment of such collusion.
The high contracting parties will facilitate as much as possible
postal and telegraphic communication at the expense of the
parties concerned and through the intervention of the clearing
offices between debtors and creditors desirous of coming to an
agreement as to the amount of their debt.
The Creditor Clearing Office will notify the Debtor Clearing
Office of all debts declared to it. The Debtor Clearing Office will,
in due course, inform the Creditor Clearing Office which debts
are admitted and which debts are contested. In the latter case
the Debtor Clearing Office will give the grounds for the non-
admission of debt.
6. When a debt has been admitted, in whole or in part, the Debtor
Clearing Office will at once credit the Creditor Clearing Office
with the amount admitted, and at the same time notify it of such
credit.
7. The debt shall be deemed to be admitted in full and shall be
credited forthwith to the Creditor Clearing Office unless within
three months from the receipt of the notification or such longer
time as may be agreed to by the Creditor Clearing Office notice
has been given by the Debtor Clearing Office that it is not
admitted.
8. When the whole or part of a debt is not admitted the two
clearing offices will examine into the matter jointly, and will
endeavor to bring the parties to an agreement.
9. The Creditor Clearing Office will pay to the individual creditor
the sums credited to it out of the funds placed at its disposal by
the Government of its country and in accordance with the
conditions fixed by the said Government, retaining any sums
considered necessary to cover risks, expenses, or commissions.
10. Any person having claimed payment of an enemy debt which is
not admitted in whole or in part shall pay to the clearing office
by way of fine interest at 5 per cent. on the part not admitted.
Any person having unduly refused to admit the whole or part
of a debt claimed from him shall pay by way of fine interest at 5
per cent. on the amount with regard to which his refusal shall be
disallowed.
Such interest shall run from the date of expiration of the period
provided for in Paragraph 7 until the date on which the claim
shall have been disallowed or the debt paid.
Each clearing office shall in so far as it is concerned take steps to
collect the fines above provided for, and will be responsible if
such fines cannot be collected.
The fines will be credited to the other clearing office, which shall
retain them as a contribution toward the cost of carrying out the
present provisions.
11. The balance between the clearing offices shall be struck
monthly, and the credit balance paid in cash by the debtor State
within a week.
Nevertheless, any credit balances which may be due by one or
more of the Allied and Associated Powers shall be retained until
complete payment shall have been effected of the sums due to
the Allied or Associated Powers or their nationals on account of
the war.
12. To facilitate discussion between the clearing offices each of them
shall have a representative at the place where the other is
established.
13. Except for special reasons all discussions in regard to claims
will, so far as possible, take place at the Debtor Clearing Office.
14. In conformity with Article 296, Paragraph (b), the high
contracting parties are responsible for the payment of the enemy
debts owing by their nationals.
The Debtor Clearing Office will therefore credit the Creditor
Clearing Office with all debts admitted, even in case of inability
to collect them from the individual debtor. The Governments
concerned will, nevertheless, invest their respective clearing
offices with all necessary powers for the recovery of debts which
have been admitted.
As an exception the admitted debts owing by persons having
suffered injury from acts of war shall only be credited to the
Creditor Clearing Office when the compensation due to the
person concerned in respect of such injury shall have been paid.
15. Each Government will defray the expenses of the clearing office
set up in its territory, including the salaries of the staff.
16. Where the two clearing offices are unable to agree whether a
debt claimed is due, or in case of a difference between an enemy
debtor and an enemy creditor, or between the clearing offices,
the dispute shall either be referred to arbitration if the parties so
agree under conditions fixed by agreement between them, or
referred to the mixed arbitral tribunal provided for in Section
VI. hereafter.
At the request of the Creditor Clearing Office the dispute may,
however, be submitted to the jurisdiction of the courts of the
place of domicile of the debtor.
17. Recovery of sums found by the Mixed Arbitral Tribunal, the
court, or the arbitration tribunal to be due shall be effected
through the clearing offices as if these sums were debts
admitted by the Debtor Clearing Office.
18. Each of the Governments concerned shall appoint an agent who
will be responsible for the presentation to the mixed arbitral
tribunal of the cases conducted on behalf of its clearing office.
This agent will exercise a general control over the
representatives or counsel employed by its nationals.
Decisions will be arrived at on documentary evidence, but it
will be open to the tribunal to hear the parties in person, or,
according to their preference, by their representatives approved
by the two Governments, or by the agent referred to above, who
shall be competent to intervene along with the party or to
reopen and maintain a claim abandoned by the same.
19. The clearing offices concerned will lay before the mixed arbitral
tribunal all the information and documents in their possession,
so as to enable the tribunal to decide rapidly on the cases which
are brought before it.
20. Where one of the parties concerned appeals against the joint
decision of the two clearing offices he shall make a deposit
against the costs, which deposit shall only be refunded when
the first judgment is modified in favor of the appellant and in
proportion to the success he may attain, his opponent in case of
such a refund being required to pay an equivalent proportion of
the costs and expenses. Security accepted by the tribunal may be
substituted for a deposit.
A fee of 5 per cent. of the amount in dispute shall be charged in
respect of all cases brought before the tribunal. This fee shall,
unless the tribunal directs otherwise, be borne by the
unsuccessful party. Such fee shall be added to the deposit
referred to. It is also independent of the security.
The tribunal may award to one of the parties a sum in respect of
the expenses of the proceedings.
Any sum payable under this paragraph shall be credited to the
clearing office of the successful party as a separate item.
21. With a view to the rapid settlement of claims, due regard shall
be paid in the appointment of all persons connected with the
clearing offices or with the Mixed Arbitral Tribunal to their
knowledge of the language of the other country concerned. Each
of the clearing offices will be at liberty to correspond with the
other, and to forward documents in its own language.
22. Subject to any special agreement to the contrary between the
Governments concerned, debts shall carry interest in accordance
with the following provisions:
Interest shall not be payable on sums of money due by way of
dividend, interest, or other periodical payments which
themselves represent interest on capital.
The rate of interest shall be 5 per cent. per annum except in
cases where, by contract, law, or custom, the creditor is entitled
to payment of interest at a different rate. In such cases the rate to
which he is entitled shall prevail.
Interest shall run from the date of commencement of hostilities
(or, if the sum of money to be recovered fell due during the war,
from the date at which it fell due) until the sum is credited to
the clearing office of the creditor.
Sums due by way of interest shall be treated as debts admitted
by the clearing offices and shall be credited to the Creditor
Clearing Office in the same way as such debts.
23. Where by decision of the clearing offices or the Mixed Arbitral
Tribunal a claim is held not to fall within Article 296, the
creditor shall be at liberty to prosecute the claim before the
courts or to take such other proceedings as may be open to him.
The presentation of a claim to the clearing office suspends the
operation of any period of prescription.
24. The high contracting parties agree to regard the decisions of the
Mixed Arbitral Tribunal as final and conclusive, and to render
them binding upon their nationals.
25. In any case where a Creditor Clearing Office declines to notify a
claim to the Debtor Clearing Office, or to take any step provided
for in this annex, intended to make effective in whole or in part
a request of which it has received due notice, the enemy creditor
shall be entitled to receive from the clearing office a certificate
setting out the amount of the claim, and shall then be entitled to
prosecute the claim before the courts or to take such other
proceedings as may be open to him.
SECTION IV.—Property, Rights, and Interests
ARTICLE 297.—The question of private property, rights, and
interests in an enemy country shall be settled according to the
principles laid down in this section and to the provisions of the
annex hereto:
a. The exceptional war measures and measures of transfer (defined
in Paragraph 3 of the annex hereto) taken by Germany with
respect to the property, rights, and interests of nationals of
Allied or Associated Powers, including companies and
associations in which they are interested, when liquidation has
not been completed, shall be immediately discontinued or
stayed and the property, rights, and interests concerned
restored to their owners, who shall enjoy full rights therein in
accordance with the provisions of Article 298.
b. Subject to any contrary stipulations which may be provided for
in the present treaty, the Allied and Associated Powers reserve
the right to retain and liquidate all property, rights, and
interests belonging at the date of the coming into force of the
present treaty to German nationals, or companies controlled by
them, within their territories, colonies, possessions, and
protectorates including territories ceded to them by the present
treaty.
The liquidation shall be carried out in accordance with the laws
of the Allied or Associated State concerned, and the German
owner shall not be able to dispose of such property, rights, or
interests nor to subject them to any charge without the consent
of that State.
German nationals who acquire ipso facto the nationality of an
Allied or Associated Power in accordance with the provisions of
the present treaty will not be considered as German nationals
within the meaning of this paragraph.
c. The price of the amount of compensation in respect of the
exercise of the right referred to in the preceding Paragraph (b)
will be fixed in accordance with the methods of sale or valuation
adopted by the laws of the country in which the property has
been retained or liquidated.
d. As between the Allied and Associated Powers or their nationals
on the one hand and Germany or her nationals on the other
hand, all the exceptional war measures, or measures of transfer,
or acts done or to be done in execution of such measures as
defined in Paragraphs 1 and 3 of the annex hereto shall be
considered as final and binding upon all persons except as
regards the reservations laid down in the present treaty.
e. The nationals of Allied and Associated Powers shall be entitled
to compensation in respect of damage or injury inflicted upon
their property, rights, or interests including any company or
association in which they are interested, in German territory as
it existed on Aug. 1, 1914, by the application either of the
exceptional war measures or measures of transfer mentioned in
Paragraphs 1 and 3 of the annex hereto. The claims made in this
respect by such nationals shall be investigated, and the total of
the compensation shall be determined by the Mixed Arbitral
Tribunal provided for in Section VI, or by an arbitrator
appointed by that tribunal. This compensation shall be borne by
Germany, and may be charged upon the property of German
nationals, within the territory or under the control of the
claimant's State. This property may be constituted as a pledge
for enemy liabilities under the conditions fixed by Paragraph 4
of the annex hereto. The payment of this compensation may be
made by the Allied or Associated State, and the amount will be
debited to Germany.
f. Whenever a national of an Allied or Associated Power is entitled
to property which has been subjected to a measure of transfer in
German territory and expresses a desire for its restitution, his
claim for compensation in accordance with Paragraph (e) shall
be satisfied by the restitution of the said property if it still exists
in specie.
In such case Germany shall take all necessary steps to restore
the evicted owner to the possession of his property, free from all
incumbrances or burdens with which it may have been charged
after the liquidation, and to indemnify all third parties injured
by the restitution.
If the restitution provided for in this paragraph cannot be
effected, private agreements arranged by the intermediation of
the powers concerned or the clearing offices provided for in the
Annex to Section III. may be made, in order to secure that the
national of the Allied or Associated Power may secure
compensation for the injury referred to in Paragraph (e) by the
grant of advantages or equivalents which he agrees to accept in
place of the property, rights or interests of which he was
deprived.
Through restitution in accordance with this article the price or
the amount of compensation fixed by the application of
Paragraph (e) will be reduced by the actual value of the
property restored, account being taken of compensation in
respect of loss of use or deterioration.
g. The rights conferred by Paragraph (f) are reserved to owners
who are nationals of Allied or Associated Powers within whose
territory legislative measures prescribing the general liquidation
of enemy property, rights or interests were not applied before
the signature of the armistice.
h. Except in cases where, by application of Paragraph (f),
restitutions in specie have been made, the net proceeds of sales
of enemy property, rights or interests wherever situated carried
out either by virtue of war legislation, or by application of this
article, and in general all cash assets of enemies, shall be dealt
with as follows:
1. As regards powers adopting Section III. and the annex
thereto, the said proceeds and cash assets shall be credited
to the power of which the owner is a national, through the
clearing office established thereunder; any credit balance in
favor of Germany resulting therefrom shall be dealt with as
provided in Article 243.
2. As regards powers not adopting Section III. and the annex
thereto, the proceeds of the property, rights and interests,
and the cash assets, of the nationals or Allied or Associated
Powers held by Germany shall be paid immediately to the
person entitled thereto or to his Government; the proceeds
of the property, rights and interests, and the cash assets, of
German nationals received by an Allied or Associated
Power shall be subject to disposal by such power in
accordance with its laws and regulations and may be
applied in payment of the claims and debts defined by this
article or Paragraph 4 of the annex hereto. Any property,
rights and interests or proceeds thereof or cash assets not
used as above provided may be retained by the said Allied
or Associated Power and if retained the cash value thereof
shall be dealt with as provided in Article 243.
In the case of liquidations effected in new States, which are
signatories of the present treaty as Allied and Associated
Powers, or in States which are not entitled to share in the
reparation payments to be made by Germany, the proceeds
of liquidations effected by such States shall, subject to the
rights of the Reparation Commission under the present
treaty, particularly under Articles 235 and 260, be paid
direct to the owner. If on the application of that owner the
Mixed Arbitral Tribunal, provided for by Section VI. of this
part or an arbitrator appointed by that tribunal, is satisfied
that the conditions of the sale or measures taken by the
Government of the State in question outside its general
legislation were unfairly prejudicial to the price obtained,
they shall have discretion to award to the owner equitable
compensation to be paid by that State.
i. Germany undertakes to compensate its nationals in respect of
the sale or retention of their property, rights or interests in
Allied or Associated States.
j. The amount of all taxes and imposts upon capital levied or to be
levied by Germany on the property, rights, and interests of the
nationals of the Allied or Associated Powers from the 11th of
November, 1918, until three months from the coming into force
of the present treaty, or, in the case of property, rights or
interests which have been subjected to exceptional measures of
war, until restitution in accordance with the present treaty, shall
be restored to the owners.
ARTICLE 298.—Germany undertakes, with regard to the
property, rights and interests, including companies and associations
in which they were interested, restored to nationals of Allied and
Associated Powers in accordance with the provisions of Article 297,
Paragraph (a) or (f):
a. to restore and maintain, except as expressly provided in the
present treaty, the property, rights, and interests of the nationals
of Allied or Associated Powers in the legal position obtaining in
respect of the property, rights, and interests of German
nationals under the laws in force before the war.
b. not to subject the property, rights, or interests of the nationals of
the Allied or Associated Powers to any measures in derogation
of property rights which are not applied equally to the property,
rights, and interests of German nationals, and to pay adequate
compensation in the event of the application of these measures.
ANNEX
1. In accordance with the provisions of Article 297, Paragraph (d),
the validity of vesting orders and of orders for the winding up
of businesses or companies, and of any other orders, directions,
decisions, or instructions of any court or any department of the
Government of any of the high contracting parties made or
given, or purporting to be made or given, in pursuance of war
legislation with regard to enemy property, rights, and interests
is confirmed. The interests of all persons shall be regarded as
having been effectively dealt with by any order, direction,
decision, or instruction dealing with property in which they
may be interested, whether or not such interests are specifically
mentioned in the order, direction, decision, or instruction. No
question shall be raised as to the regularity of a transfer of any
property, rights, or interests dealt with in pursuance of any such
order, direction, decision, or instruction. Every action taken
with regard to any property, business, or company, whether as
regards its investigation, sequestration, compulsory
administration, use, requisition, supervision, or winding up, the
sale or management of property, rights, or interests, the
collection or discharge of debts, the payment of costs, charges or
expenses, or any other matter whatsoever, in pursuance of
orders, directions, decisions, or instructions of any court or of
any department of the Government of any of the high
contracting parties, made or given, or purporting to be made or
given in pursuance of war legislation with regard to enemy
property, rights or interests, is confirmed. Provided that the
provisions of this paragraph shall not be held to prejudice the
titles to property heretofore acquired in good faith and for value
and in accordance with the laws of the country in which the
property is situated by nationals of the Allied and Associated
Powers.
The provisions of this paragraph do not apply to such of the
above-mentioned measures as have been taken by the German
authorities in invaded or occupied territory, nor to such of the
above mentioned measures as have been taken by Germany or
the German authorities since Nov. 11, 1918, all of which shall be
void.
2. No claim or action shall be made or brought against any Allied
or Associated Power or against any person acting on behalf of or
under the direction of any legal authority or department of the
Government of such a power by Germany or by any German
national wherever resident in respect of any act or omission
with regard to his property, rights, or interests during the war
or in preparation for the war. Similarly no claim or action shall
be made or brought against any person in respect of any act or
omission under or in accordance with the exceptional war
measures, laws, or regulations of any Allied or Associated
Power.
3. In Article 297 and this Annex the expression exceptional war
measures includes measures of all kinds, legislative,
administrative, judicial, or others, that have been taken or will
be taken hereafter with regard to enemy property, and which
have had or will have the effect of removing from the
proprietors the power of disposition over their property, though
without affecting the ownership, such as measures of
supervision, of compulsory administration, and of
sequestration; or measures which have had or will have as an
object the seizure of, the use of, or the interference with enemy
assets, for whatsoever motive, under whatsoever form or in
whatsoever place. Acts in the execution of these measures
include all detentions, instructions, orders or decrees of
Government departments or courts applying these measures to
enemy property, as well as acts performed by any person
connected with the administration or the supervision of enemy
property, such as the payment of debts, the collecting of credits,
the payment of any costs, charges, or expenses, or the collecting
of fees.
Measures of transfer are those which have affected or will affect
the ownership of enemy property by transferring it in whole or
in part to a person other than the enemy owner, and without his
consent, such as measures directing the sale, liquidation, or
devolution of ownership in enemy property, or the canceling of
titles or securities.
4. All property, rights, and interests of German nationals within
the territory of any Allied or Associated Power and the net
proceeds of their sale, liquidation or other dealing therewith
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

More Related Content

PDF
Attaining High Performance Communications A Vertical Approach 1st Edition Ada...
PDF
Edge Networking Internet Of Edges Khaldoun Al Agha Pauline Loygue
PDF
Attaining High Performance Communications A Vertical Approach 1st Edition Ada...
PDF
Bx310x Product Specification
PDF
Design And Implementation Of A Phone Card Company
PDF
Begining j2 me
PDF
Cybersecurity Of Industrial Systems Jeanmarie Flaus
PDF
Motorola enterprise wlan design guide version 1.2
Attaining High Performance Communications A Vertical Approach 1st Edition Ada...
Edge Networking Internet Of Edges Khaldoun Al Agha Pauline Loygue
Attaining High Performance Communications A Vertical Approach 1st Edition Ada...
Bx310x Product Specification
Design And Implementation Of A Phone Card Company
Begining j2 me
Cybersecurity Of Industrial Systems Jeanmarie Flaus
Motorola enterprise wlan design guide version 1.2

Similar to Overlay Networks Toward Information Networking 1st Edition Sasu Tarkoma (20)

PDF
Designing Enterprise Applications with the J2EE Platform 2nd Edition Inderjee...
PDF
Networking Notes For DIT Part 1
PDF
software-eng.pdf
PDF
Whitepaper on distributed ledger technology
PDF
Big data technologies : A survey
PDF
Network Basics (printouts)
PDF
Uni cambridge
 
PDF
Deployment guide
PDF
Wireshark user's guide
PDF
Ibm mobile first in action for mgovernment and citizen mobile services red
PDF
Data over dab
PDF
VoLTE and ViLTE.pdf
PDF
IBM Streams - Redbook
PDF
Advanced Metasearch Engine Technology Weiyi Meng Clement T Yu
PDF
This is
PDF
IBM Flex System Networking in an Enterprise Data Center
DOCX
@author Jane Programmer @cwid 123 45 678 @class.docx
PDF
Implementing IBM InfoSphere BigInsights on System x
PDF
LTE_from_Theory_to_Practise.pdf
DOCX
@author Jane Programmer @cwid 123 45 678 @class
Designing Enterprise Applications with the J2EE Platform 2nd Edition Inderjee...
Networking Notes For DIT Part 1
software-eng.pdf
Whitepaper on distributed ledger technology
Big data technologies : A survey
Network Basics (printouts)
Uni cambridge
 
Deployment guide
Wireshark user's guide
Ibm mobile first in action for mgovernment and citizen mobile services red
Data over dab
VoLTE and ViLTE.pdf
IBM Streams - Redbook
Advanced Metasearch Engine Technology Weiyi Meng Clement T Yu
This is
IBM Flex System Networking in an Enterprise Data Center
@author Jane Programmer @cwid 123 45 678 @class.docx
Implementing IBM InfoSphere BigInsights on System x
LTE_from_Theory_to_Practise.pdf
@author Jane Programmer @cwid 123 45 678 @class
Ad

Recently uploaded (20)

PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Updated Idioms and Phrasal Verbs in English subject
PPTX
Cell Structure & Organelles in detailed.
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PDF
Trump Administration's workforce development strategy
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
A systematic review of self-coping strategies used by university students to ...
PDF
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Classroom Observation Tools for Teachers
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Updated Idioms and Phrasal Verbs in English subject
Cell Structure & Organelles in detailed.
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
LDMMIA Reiki Yoga Finals Review Spring Summer
Practical Manual AGRO-233 Principles and Practices of Natural Farming
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
Trump Administration's workforce development strategy
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
A systematic review of self-coping strategies used by university students to ...
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Final Presentation General Medicine 03-08-2024.pptx
Supply Chain Operations Speaking Notes -ICLT Program
Classroom Observation Tools for Teachers
Anesthesia in Laparoscopic Surgery in India
Module 4: Burden of Disease Tutorial Slides S2 2025
Paper A Mock Exam 9_ Attempt review.pdf.
Ad

Overlay Networks Toward Information Networking 1st Edition Sasu Tarkoma

  • 1. Overlay Networks Toward Information Networking 1st Edition Sasu Tarkoma download https://ebookbell.com/product/overlay-networks-toward- information-networking-1st-edition-sasu-tarkoma-2530196 Explore and download more ebooks at ebookbell.com
  • 2. Here are some recommended products that we believe you will be interested in. You can click the link to download. Future Networks Services And Management Underlay And Overlay Edge Applications Slicing Cloud Space Aiml And Quantum Computing 1st Edition Mehmet Toy https://ebookbell.com/product/future-networks-services-and-management- underlay-and-overlay-edge-applications-slicing-cloud-space-aiml-and- quantum-computing-1st-edition-mehmet-toy-36422234 Currency Overlay A Practical Guide Second Edition 2nd Edition Hai Xin https://ebookbell.com/product/currency-overlay-a-practical-guide- second-edition-2nd-edition-hai-xin-34596858 Currency Overlay Neil Record https://ebookbell.com/product/currency-overlay-neil-record-1438666 Supporting Web Search And Navigation By An Overlay Linking Structure 1st Edition Georg Philipp Rorucker https://ebookbell.com/product/supporting-web-search-and-navigation-by- an-overlay-linking-structure-1st-edition-georg-philipp- rorucker-58311378
  • 3. Overlap Web And Typography 1st Edition Jana Kemmer Tabea Hartwich https://ebookbell.com/product/overlap-web-and-typography-1st-edition- jana-kemmer-tabea-hartwich-145041706 Overly Dramatic Rebecca Cohen https://ebookbell.com/product/overly-dramatic-rebecca-cohen-48836636 Overlap At Benson https://ebookbell.com/product/overlap-at-benson-57583468 The Overlap Of Affective And Schizophrenic Spectra 1st Edition Andreas Marneros https://ebookbell.com/product/the-overlap-of-affective-and- schizophrenic-spectra-1st-edition-andreas-marneros-1767470 The Overly Honest Teacher Meredith Essalat https://ebookbell.com/product/the-overly-honest-teacher-meredith- essalat-12122984
  • 5. Overlay Networks Toward Information Networking © 2010 Taylor and Francis Group, LLC
  • 6. OTHER telecommunications BOOKS FROM AUERBACH Broadband Mobile Multimedia: Techniques and Applications Yan Zhang, Shiwen Mao, Laurence T. Yang, and Thomas M Chen ISBN: 978-1-4200-5184-1 Carrier Ethernet: Providing the Need for Speed Gilbert Held ISBN: 978-1-4200-6039-3 Cognitive Radio Networks Yang Xiao and Fei Hu ISBN: 978-1-4200-6420-9 Contemporary Coding Techniques and Applications for MobileCommunications Onur Osman and Osman Nuri Ucan ISBN: 978-1-4200-5461-3 Converging NGN Wireline and Mobile 3G Networks with IMS: Converging NGN and 3G Mobile Rebecca Copeland ISBN: 978-0-8493-9250-4 Cooperative Wireless Communications Yan Zhang, Hsiao-Hwa Chen, and Mohsen Guizani ISBN: 978-1-4200-6469-8 Data Scheduling and Transmission Strategies in Asymmetric Telecommunication Environments Abhishek Roy and Navrati Saxena ISBN: 978-1-4200-4655-7 Encyclopedia of Wireless and Mobile Communications Borko Furht ISBN: 978-1-4200-4326-6 IMS: A New Model for Blending Applications Mark Wuthnow, Jerry Shih, and Matthew Stafford ISBN: 978-1-4200-9285-1 The Internet of Things: From RFID to the Next-Generation Pervasive Networked Systems Lu Yan, Yan Zhang, Laurence T. Yang, and Huansheng Ning ISBN: 978-1-4200-5281-7 Introduction to Communications Technologies: A Guide for Non-Engineers, Second Edition Stephan Jones, Ron Kovac, and Frank M. Groom ISBN: 978-1-4200-4684-7 Long Term Evolution: 3GPP LTE Radio and Cellular Technology Borko Furht and Syed A. Ahson ISBN: 978-1-4200-7210-5 MEMS and Nanotechnology-Based Sensors and Devices for Communications, Medical and Aerospace Applications A. R. Jha ISBN: 978-0-8493-8069-3 Millimeter Wave Technology in Wireless PAN, LAN, and MAN Shao-Qiu Xiao and Ming-Tuo Zhou ISBN: 978-0-8493-8227-7 Mobile Telemedicine: A Computing and Networking Perspective Yang Xiao and Hui Chen ISBN: 978-1-4200-6046-1 Optical Wireless Communications: IR for Wireless Connectivity Roberto Ramirez-Iniguez, Sevia M. Idrus, and Ziran Sun ISBN: 978-0-8493-7209-4 Satellite Systems Engineering in an IPv6 Environment Daniel Minoli ISBN: 978-1-4200-7868-8 Security in RFID and Sensor Networks Yan Zhang and Paris Kitsos ISBN: 978-1-4200-6839-9 Security of Mobile Communications Noureddine Boudriga ISBN: 978-0-8493-7941-3 Unlicensed Mobile Access Technology: Protocols, Architectures, Security, Standards and Applications Yan Zhang, Laurence T. Yang, and Jianhua Ma ISBN: 978-1-4200-5537-5 Value-Added Services for Next Generation Networks Thierry Van de Velde ISBN: 978-0-8493-7318-3 Vehicular Networks: Techniques, Standards, and Applications Hassnaa Moustafa and Yan Zhang ISBN: 978-1-4200-8571-6 WiMAX Network Planning and Optimization Yan Zhang ISBN: 978-1-4200-6662-3 Wireless Quality of Service: Techniques, Standards, and Applications Maode Ma and Mieso K. Denko ISBN: 978-1-4200-5130-8 AUERBACH PUBLICATIONS www.auerbach-publications.com To Order Call: 1-800-272-7737 • Fax: 1-800-374-3401 E-mail: orders@crcpress.com © 2010 Taylor and Francis Group, LLC
  • 7. Overlay Networks Sasu Tarkoma Toward Information Networking © 2010 Taylor and Francis Group, LLC
  • 8. Auerbach Publications Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2010 by Taylor and Francis Group, LLC Auerbach Publications is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed in the United States of America on acid-free paper 10 9 8 7 6 5 4 3 2 1 International Standard Book Number-13: 978-1-4398-1373-7 (Ebook-PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the valid- ity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or uti- lized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopy- ing, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http:// www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the Auerbach Web site at http://www.auerbach-publications.com © 2010 Taylor and Francis Group, LLC
  • 9. Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi About the Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Overlay Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4 Properties of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.5 Structure of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2 Network Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1 Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Firewalls and NATs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3 Naming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4 Addressing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18 2.5 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.5.2 Interdomain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20 2.5.3 Border Gateway Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.5.4 Current Challenges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21 2.5.5 Compact Routing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22 2.6 Multicast. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22 2.6.1 Network-layer Multicast. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23 2.6.2 Application-layer Multicast. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24 2.6.3 Chaining TCP Connections for Multicast. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25 2.7 Network Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.7.1 Vivaldi Centralized Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26 2.7.2 Vivaldi Distributed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.7.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.7.4 Triangle Inequality Violation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27 2.8 Network Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.8.1 Routing Algorithm Invariants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.8.2 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.8.3 Shortest Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.8.4 Routing Table Size and Stretch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29 2.8.5 Forwarding Load. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29 2.8.6 Churn. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30 2.8.7 Other Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3 Properties of Networks and Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.1 Data on the Internet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.1.1 Video Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.1.2 P2P Traffic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35 v © 2010 Taylor and Francis Group, LLC
  • 10. vi Contents 3.1.3 Trends in Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2 Zipf’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.2.2 Zipf’s Law and the Internet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .37 3.2.3 Implications for P2P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3 Scale-free Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.4 Robustness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .39 3.5 Small Worlds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40 4 Unstructured Overlays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.2 Early Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.3 Locating Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.4 Napster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.5 Gnutella . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.5.2 Searching the Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.5.3 Efficient Keyword Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.6 Skype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.7 BitTorrent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.7.1 Torrents and Swarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.7.2 Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.7.3 Choking Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.7.4 Antisnubbing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.7.5 End Game. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55 4.7.6 Trackerless Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.7.7 BitTorrent Vulnerabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.7.8 Service Capacity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56 4.7.9 Fluid Models for Performance Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .57 4.8 Cross-ISP BitTorrent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.9 Freenet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .60 4.9.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.9.2 Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.9.3 Identifier keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.9.4 Key-based Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.9.5 Indirect Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.9.6 API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.9.7 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.10 Comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67 5 Foundations of Structured Overlays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.2 Geometries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .72 5.2.1 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.2.2 Hypercubes and Tori . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.2.3 Butterflies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.2.4 de Bruijn graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .74 5.2.5 Rings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .75 5.2.6 XOR Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.3 Consistent Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 © 2010 Taylor and Francis Group, LLC
  • 11. Contents vii 5.4 Distributed Data Structures for Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.4.1 Linear Hashing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .78 5.4.2 SDDS Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.4.3 LH* Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .80 5.4.4 Ninja. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82 6 Distributed Hash Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.2 APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 6.3 Plaxton’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.3.1 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.3.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.4 Chord. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .89 6.4.1 Joining the Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .90 6.4.2 Leaving the Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 6.4.3 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.4.4 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 6.5 Pastry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 6.5.1 Joining and Leaving the Network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .93 6.5.2 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6.5.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.5.4 Bamboo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 6.6 Koorde. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .96 6.6.1 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 6.6.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6.7 Tapestry. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .97 6.7.1 Joining and Leaving the Network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .98 6.7.2 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.7.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 6.8 Kademlia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6.8.1 Joining and Leaving the Network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .101 6.8.2 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6.8.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.9 Content Addressable Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.9.1 Joining the Network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .103 6.9.2 Leaving the Network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .104 6.9.3 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.9.4 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.10 Viceroy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 6.10.1 Joining the Network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .106 6.10.2 Leaving the Network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .107 6.10.3 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.10.4 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.11 Skip Graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .108 6.12 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.12.1 Geometries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .110 6.12.2 Routing Function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .110 6.12.3 Churn. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .110 6.12.4 Asymptotic Trade-offs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 6.12.5 Network Proximity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .112 6.12.6 Adding Hierarchy to DHTs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 © 2010 Taylor and Francis Group, LLC
  • 12. viii Contents 6.12.7 Experimenting with Overlays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.12.8 Criticism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 7 Probabilistic Algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .115 7.1 Overview of Bloom Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .115 7.2 Bloom Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 7.2.1 False Positive Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 7.2.2 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 7.2.3 d-left Counting Bloom Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .120 7.2.4 Compressed Bloom Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .121 7.2.5 Counting Bloom Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .121 7.2.6 Hierarchical Bloom Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 7.2.7 Spectral Bloom Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 7.2.8 Bloomier Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 7.2.9 Approximate State Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 7.2.10 Perfect Hashing Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 7.2.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 7.3 Bloom Filters in Distributed Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 7.3.1 Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 7.3.2 P2P Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .128 7.3.3 Packet Routing and Forwarding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 7.3.4 Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 7.4 Gossip Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 7.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 7.4.2 Design Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7.4.3 Basic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7.4.4 Basic Shuffling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 7.4.5 Enhanced Shuffling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 7.4.6 Flow Control and Fairness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .135 7.4.7 Gossip for Structured Overlays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 8 Content-based Networking and Publish/Subscribe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 8.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 8.2 DHT-based Data-centric Communications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 8.2.1 Scribe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 8.2.2 Bayeux. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .139 8.2.3 SplitStream. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .139 8.2.4 Overcast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 8.2.5 Meghdoot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .141 8.2.6 MEDYM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 8.2.7 Internet Indirection Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 8.2.8 Data-oriented Network Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .141 8.2.9 Semantic Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 8.2.10 Distributed Segment Tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .142 8.2.11 Semantic Queries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .143 8.3 Content-based Routing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .144 8.4 Router Configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .145 8.4.1 Basic Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 8.4.2 Structured DHT-based Overlays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 8.4.3 Interest Propagation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .147 8.5 Siena and Routing Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 © 2010 Taylor and Francis Group, LLC
  • 13. Contents ix 8.5.1 Routing Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 8.5.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 8.5.3 Siena Filters Poset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 8.5.4 Advertisements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 8.5.5 Poset-derived Forest. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .152 8.5.6 Filter Merging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .154 8.6 Hermes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 8.7 Formal Specification of Content-based Routing Systems . . . . . . . . . . . . . . . . . . . . . 158 8.7.1 Valid Routing Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .158 8.7.2 Weakly Valid Routing Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .159 8.7.3 Mobility-Safety. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .159 8.8 Pub/sub Mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 9 Security. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .165 9.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 9.2 Attacks and Threats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 9.2.1 Worms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 9.2.2 Sybil Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 9.2.3 Eclipse Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 9.2.4 File Poisoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 9.2.5 Man-in-the-Middle Attack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .168 9.2.6 DoS Attack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .168 9.3 Securing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 9.3.1 Self-Certifying Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 9.3.2 Merkle Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 9.3.3 Information Dispersal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 9.3.4 Secret-sharing Schemes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .171 9.3.5 Smartcards for Bootstrapping Trust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 9.3.6 Distributed Steganographic File Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .172 9.3.7 Erasure Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 9.3.8 Censorship Resistance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .173 9.4 Security Issues in P2P Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .174 9.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 9.4.2 Insider Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 9.4.3 Outsider Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 9.4.4 SybilGuard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .177 9.4.5 Reputation Management with EigenTrust. . . . . . . . . . . . . . . . . . . . . . . . . . . . .178 9.5 Anonymous Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 9.5.1 Mixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 9.5.2 Onion Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 9.5.3 Tor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 9.5.4 P2P Anonymization System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 9.5.5 Censorship-resistant Lookup: Achord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 9.5.6 Crowds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 9.5.7 Hordes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .184 9.5.8 Mist. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .185 9.6 Security Issues in Pub/Sub Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 9.6.1 Hermes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 9.6.2 EventGuard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 9.6.3 QUIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 © 2010 Taylor and Francis Group, LLC
  • 14. x Contents 10 Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .189 10.1 Amazon Dynamo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 10.1.1 Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .191 10.1.2 Ring Membership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 10.1.3 Partitioning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 10.1.4 Replication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .194 10.1.5 Data Versioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 10.1.6 Vector Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 10.1.7 Coping with Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 10.2 Overlay Video Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 10.2.1 Live Streaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 10.2.2 Video-on-Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 10.3 SIP and P2PSIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 10.4 CDN Solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .203 10.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 10.4.2 Akamai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 10.4.3 Limelight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 10.4.4 Coral. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .208 10.4.5 Comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .211 11 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .217 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 © 2010 Taylor and Francis Group, LLC
  • 15. Preface Data and media delivery have become hugely popular on the Internet, with well over 1 billion Internet users. Therefore scalable and flexible information dissemination solutions are needed. Much of the current development pertaining to services and service delivery happens above the basic network layer and the TCP/IP protocol suite because of the need to be able to rapidly develop and deploy them. In recent years, various kinds of overlay networking technologies have emerged as an active area of research and development. Overlay systems, especially peer-to-peer systems, aretechnologiesthatcansolveproblemsinmassiveinformationdistributionandprocessing tasks. The key aim of many of these technologies is to be able to offer deployable solution for processing and distributing vast amounts of information, typically petabytes and more, while at the same time keeping the scaling costs low. The aim of this book is to present the state of the art in overlay technologies, examine the key structures and algorithms used in overlay networks, and discuss their applications. Overlay networks have been a very active area of research and development during the last 10 years, and a substantial amount of scientific literature has formed around this topic. This book has been inspired by the teaching notes and articles of the author in content- based routing. The book is designed not only as a reference for overlay technologies, but also as a textbook for a course in distributed overlay technologies and information networking at the graduate level. xi © 2010 Taylor and Francis Group, LLC
  • 16. © 2010 Taylor and Francis Group, LLC
  • 17. About the Author Sasu Tarkoma received his M.Sc. and Ph.D. degrees in Computer Science from the Uni- versity of Helsinki, Department of Computer Science. He is currently professor at Helsinki University of Technology, Department of Computer Science and Engineering. He has been recently appointed as full professor at University of Helsinki, Department of Computer Science. He has managed and participated in national and international research projects at the University of Helsinki, Helsinki University of Technology, and Helsinki Institute for Information Technology (HIIT). He has worked in the IT industry as a consultant and chief system architect, and he is principal member of research staff at Nokia Research Center. He has over 100 publications, and has also contributed to several books on mobile middleware. Ms. Nelli Tarkoma produced most of the diagrams used in this book. xiii © 2010 Taylor and Francis Group, LLC
  • 18. © 2010 Taylor and Francis Group, LLC
  • 19. 1 Introduction 1.1 Overview In recent years, various kinds of overlay networking technologies have emerged as an active area of research and development. Overlay systems, especially peer-to-peer (P2P) systems, are technologies that can solve problems in massive information distribution and processing tasks. The key aim of many of these technologies is to be able to offer deployable solution for processing and distributing vast amounts of information, typically petabytes and more, while at the same time keeping the scaling costs low. Data and media delivery have become hugely popular on the Internet. Currently there are over 1.4 billion Internet users, well over 3 billion mobile phones, and 4 billion mobile subscriptions. By 2000 the Google index reached the 1 billion indexed web resources mark, and by 2008 it reached the trillion mark. Multimedia content, especially videos, are paving the way for truly versatile network services that both compete with and extend existing broadcast-based medias. As a conse- quence, new kinds of social collaboration and advertisement mechanisms are being intro- duced both in the fixed Internet and also in the mobile world. This trend is heightened by the ubiquitous nature of digital cameras. Indeed, this has created a lot of interest in community-based services, in which users create their own content and make it available to others. These developments have had a profound impact on network requirements and perfor- mance. Video delivery has become one of the recent services on the Web with the advent of YouTube [67] and other social media Web sites. Moreover, the network impact is height- ened by various P2P services. Estimates of P2P share of network traffic range from 50% to 70%. Cisco’s latest traffic forecast for 2009–2013 indicates that annual global IP traffic will reach 667 exabytes in 2013, two-thirds of a zettabyte [79]. An exabyte (EB) is an SI unit of information, and 1 EB equals 1018 bytes. Exabyte is followed by the zettabyte (1 Z = 1021 ) and yottabyte (1 Y = 1024 ). The traffic is expected to increase some 40% each year. Much of this increase comes from the delivery of video data in various forms. Video delivery on the Internet will see a huge increase, and the volume of video delivery in 2013 is expected to be 700 times the capacity of the US Internet backbone in 2000. The study anticipates that video traffic will account for 91% of all consumer traffic in 2013. According to the study, P2P traffic will continue to grow but will become a smaller component of Internet traffic in terms of its current share. The current P2P systems in 2009 are transferring 3.3 EB data per month. The recent study indicates that the P2P share of consumer Internet traffic will drop to 20% by 2013, down from the current 50% (at the end of 2008). Even though the P2P share may drop, most video delivery solutions, accounting for much of the traffic increase, will utilize overlay technologies, which makes this area crucial for ensuring efficient and scalable services. 1 © 2010 Taylor and Francis Group, LLC
  • 20. 2 Overlay Networks: Toward Information Networking A P2P network consists of nodes that cooperate in order to provide services to each other. A pure P2P network consists of equal peers that are simultaneously clients and servers. The P2P model differs from the client-server model, where clients access services provided by logically centralized servers. To date, P2P delivery has not been successfully combined with browser-based operation and media sites such as YouTube. Nevertheless, a number of businesses have realized the importance of scalable data delivery. For example, the game company Blizzard uses P2P technology to distribute patches for the World of Warcraft game. Given the heavy use of network, P2P protocols such as BitTorrent offer to reduce network load by peer- assisted data delivery. This means that peer users cooperate to transfer large files over the network. 1.2 Overlay Technology Data structures and algorithms are central for today’s data communications. We may con- sider circuit switching technology as an example of how information processing algorithms are vital for products and how innovation changes markets. Early telephone systems were based on manual circuit switching. Everything was done using human hands. Later systems used electromechanical devices to connect calls, but they required laborious preconfigu- ration of telephone numbers and had limited scalability. Modern digital circuit switching algorithms evolved from these older semiautomatic systems and optimize the number of connections in a switch. The nonblocking minimal spanning tree algorithm enabled the optimization of these automatic switches. Any algorithm used to connect millions of calls must be proven to be correct and efficient. The latest development changes the fundamen- tals of telephone switching, because information is forwarded as packets on a hop-by-hop basis and not via preestablished physical circuits. Today, this complex machinery enables end-to-end connectivity irrespective of time and location. Data structures are at the heart of the Internet. Network-level routers use efficient algo- rithms for matching data packets to outgoing interfaces based on prefixes. Internet back- bone routers have to manage 200,000 routes and more in order to route packets between systems. The matching algorithms include suffix trees and ternary content addressable memo- ries (TCAMs) [268], which have to balance between matching efficiency and router memory. Therefore, just as with telephone switches, optimization plays a major role in the develop- ment of routers and routing systems. The current generation of networks is being developed on top of TCP/IPs network-layer (layer 3 in the open systems interconnection (OSI) stack). These so-called overlay networks come in various shapes and forms. Overlays make many implementation issues easier, because network-level routers do not need to be changed. In many ways, overlay networks represent a fundamental paradigm shift compared to older technologies such as circuit switching and hierarchical routing. Overlay networks are useful both in control and content plane scenarios. This division of traffic into control and content is typical of current telecommunications solutions such as the session initiation protocol (SIP); however, this division does not exist on the current Internet as such. As control plane elements, overlays can be used to route control messages and connect different entities. As content plane elements, they can participate in data forwarding and dissemination. © 2010 Taylor and Francis Group, LLC
  • 21. Introduction 3 An overlay network is a network that is built on top of an existing network. The overlay therefore relies on the so-called underlay network for basic networking functions, namely routing and forwarding. Today, most overlay networks are built in the application layer on top of the TCP/IP networking suite. Overlay technolo- gies can be used to overcome some of the limitations of the underlay, at the same time offering new routing and forwarding features without changing the routers. The nodes in an overlay network are connected via logical links that can span many physical links. A link between two overlay nodes may take several hops in the underlying network. An overlay network therefore consists of a set of distributed nodes, typically client de- vices or servers, that are deployed on the Internet. The nodes are expected to meet the following requirements: 1. Support the execution of one or more distributed applications by providing infra- structure for them. 2. Participate in and support high-level routing and forwarding tasks. The overlay is expected to provide data-forwarding capabilities that are different from those that are part of the basic Internet. 3. Deploy across the Internet in such a way that third parties can participate in the organization and operation of the overlay network. Figure 1.1 presents a layered view to overlay networks. The view starts from the underlay, the network that offers the basic primitives of sending and receiving messages (packets). The two obvious choices today are UDP and TCP as the transport layer protocols. TCP is favored due to its connection-oriented nature, congestion control, and reliability. After the underlay layer, we have the custom routing, forwarding, rendezvous, and discovery functions of the overlay architecture. Routing pertains to the process of building and maintaining routing tables. Forwarding is the process of sending messages toward their destination, and rendezvous is a function that is used to resolve issues regarding some identifier or node—for example, by offering indirection support in the case of mobility. Discovery is an integral part of this layer and is needed to populate the routing table by discovering both physically and logically nearby neighbors. Security and resource management, reliability, fault tolerance Services management Applications, services, tools Routing, forwarding, rendezvous, discovery Network FIGURE 1.1 Layered view to overlay networks. © 2010 Taylor and Francis Group, LLC
  • 22. 4 Overlay Networks: Toward Information Networking The next layer introduces additional functions, such as security and resource manage- ment, reliability support, and fault tolerance. These are typically built on top of the basic overlay functions mentioned above. Security pertains to the way node identities are as- signed and controlled, and messages and packets are secured. Security encompasses mul- tiple protocol layers and is responsible for ensuring that peers can maintain sufficient level of trust toward the system. Resource management is about taking content demand and supply into account and ensuring that certain performance and reliability requirements are met. For example, relevant issues are data placement and replication rate. Data replication is also a basic mechanism for ensuring fault-tolerance. If one node fails, another can take its place and, given that the data was replicated, there is no loss of information. Above this layer, we have the services management for both monitoring and controlling service lifecycles. When a service is deployed on top of an overlay, there need to be functions for administering it and controlling various issues such as administrative boundaries, and data replication and access control policies. Finally, in the topmost layer we have the actual applications and services that are executed on top of the layered overlay architecture. The applications rely on the overlay architecture for scalable and resilient data discovery and exchange. An overlay network offers a number of advantages over both centralized solutions and solutions that introduce changes in routers. These include the following three key advantages: Incremental deployment: Overlay networks do not require changes to the existing routers. This means that an overlay network can be grown node by node, and with more nodes it is possible to both monitor and control routing paths across the Internet from one overlay node to another. An overlay network can be built based on standard network protocols and existing APIs—for example, the Sockets API of the TCP/IP protocol stack. Adaptable: The overlay algorithm can utilize a number of metrics when making rout- ing and forwarding decisions. Thus the overlay can take application-specific con- cerns into account that are not currently offered by the Internet infrastructure. Key metrics include latency, bandwidth, and security. Robust: An overlay network is robust to node and network failures due to its adaptable nature. With a sufficient number of nodes in the overlay, the network may be able to offer multiple independent (router-disjoint) paths to the same destination. At best, overlay networks are able to route around faults. The designers of an early overlay system called resilient overlay network (RON) [361] used the idea of alternative paths to improve performance and to route around network faults. Figure 1.2 illustrates how overlay technology can be used to route around faults. In this example, there is a problem with the normal path between A and B across the Internet. Now, the overlay can use a so-called detour path through C to send traffic to B. This will result in some networking overhead but can be used to maintain communications between A and B. Overlay networks face also a number of challenges and limitations. The three central challenges include the following: • The real world: In practice, the typical underlay protocol, IP, does not provide uni- versal end-to-end connectivity due to the ubiquitous nature of firewalls and network address translation (NAT) devices. This means that special solutions are needed to overcome reachability issues. In addition, many overlay networks are oblivious to the current organizational and management structures that exist in applications © 2010 Taylor and Francis Group, LLC
  • 23. Introduction 5 A Normal path Route around the problem Logical links X Internet B C FIGURE 1.2 Improving resiliency using overlay techniques. and also in network designs. For example, most of the overlay solutions presented in this book do not take Internet topology into account from the viewpoint of the autonomous systems (ASs) and inter-AS traffic. • Management and administration: Practical deployment requires that the overlay network have a management interface. This is relatively easy to realize for a single administrative domain; however, when there are many parties involved, the man- agement of the overlay becomes nontrivial. Indeed, at the moment most overlays involve a single administrative domain. The administrator of an overlay network is typically removed from the actual physical devices that participate in the overlay. This requires advanced techniques for detecting failed nodes or nodes that exhibit suspect behaviors. • Overhead: An overlay network typically consists of a heterogeneous body of de- vices across the Internet. It is clear that the overlay network cannot be as efficient as the dedicated routers in processing packets and messages. Moreover, the over- lay network may not have adequate information about the Internet topology to properly optimize routing processes. Figure 1.3 presents a taxonomy of overlay systems. Overlays can be router-based, or they can be completely implemented on top of the underlay, typically TCP/IP. Router- based overlays typically employ IP Multicast [107, 130] and IP Anycast [106] features; however, given the fact that deployment of the next version of the IP protocol, IPv6 [106], has not progressed according to most optimistic expectations, these extensions are not Router-based (IP multicast) No router support Infrastructure-centric (CDNs) End-systems only End-Systems with infrastructure support Overlay multicast FIGURE 1.3 Taxonomy of overlay networks. © 2010 Taylor and Francis Group, LLC
  • 24. 6 Overlay Networks: Toward Information Networking globally supported on the Internet. If the routers only provide basic unicast end-to-end communication, information networking functions need to be provided by the overlay. Content delivery networks (CDNs) are examples of overlay networks that cache and storecontentandallowefficientandlesscostlywaystodistributedataonamassive scale. CDNs typically do not require changes to end-systems, and they are not P2P solutions from the viewpoint of the end clients. The two remaining categories illustrated in Figure 1.3 are end-systems with and without infrastructure support, respectively. The former combines fixed infrastructure with soft- ware running in the end-systems in order to realize efficient data distribution. The latter category does not involve fixed infrastructure, but rather establishes the overlay network in a decentralized manner. Overlay networks allow the introduction of more complex networking functionality on top of the basic IP routing functionality. For example, filter-based routing, onion routing, distributed hash tables (DHTs), and trigger-based forwarding are examples of new kinds of communication paradigms. DHTs are a class of decentralized distributed algorithms that offer a lookup service. DHTs store (key, value) pairs, and they support the lookup of the value associated with a given key. The keys and values are distributed in the system, and the DHT system must ensure that the nodes have sufficient information of the global state to be able to forward and process lookup requests properly. The DHT algorithm is responsible for distributing the keys and values in such a way that efficient lookup of the value corresponding to a key becomes possible. Since peer nodes may come and go, this requires that the algorithm be able to cope with changes in the distributed system. In addition, the locality of data plays an important part in all overlays, since they are executed on top of an existing network, typically the Internet. The overlay should take the network locations of the peers into account when deciding where data is stored, and where messages are sent, in order to minimize networking overhead. Figure 1.4 illustrates the key DHT API functions that allow peers to insert, look up, and remove values associated with a key. Typically, the key is a hash value, so-called flat label, which realizes essentially a flat namespace that can be used by the DHT algorithm to optimize processing. DHTs are a class of decentralized distributed systems. They provide a logically centralized lookup service similar to hash tables. A DHT stores (key, value) pairs and allows a client to retrieve a value associated with a given key. The DHT is typically realized as a structured P2P network in which peers cooperate to provide the service across the Internet. Distributed applications Node Node Node Node Distributed hash table (DHT) put(key, value) delete(key, value) value get(key) DHT balances keys and data across nodes DHT API FIGURE 1.4 DHT API. © 2010 Taylor and Francis Group, LLC
  • 25. Introduction 7 There are two main classes of P2P networks, structured and unstructured. In the former type, the overlay network topology is tightly controlled by the P2P system and content is distributed in such a way that queries can be made efficiently. Many structured P2P systems utilize DHT algorithms in order to map object identifiers to distributed nodes. Unstructured P2P networks do not have such tightly controlled structure, but rather they utilize flooding and similar opportunistic techniques, such as random walks and expanding-ring time-to-live (TTL) search, for finding peers that host interesting data. Each peer receiving a query can then evaluate the query locally using its own content. This allows unstructured P2P systems to support more complex queries than are typically supported by structured DHT-based systems. Unstructured P2P algorithms are called first generation and the structured algorithms are called second generation. They can also be combined to create hybrid systems. The key-based structured algorithms have a desirable property: namely, that they can find data locations within a bounded number of overlay hops [162]. The unstructured broadcasting-based algorithms, although resilient to network problems, may have large routing costs due to flooding, or may be unable to find available content [274]. Another approach to P2P systems is to divide them into two classes, pure and hybrid P2P systems. In the former, each peer is simultanously a client and a server, and the operation is decentralized. In the latter class, a centralized component is used to support the P2P network. Figure 1.5 illustrates the inherent trade-off between completeness and expressiveness of an overlay system. By completeness we mean the ability of the system to guarantee the location and retrieval of a piece of data. Expressiveness pertains to the system’s ability to reason about the data—for example, how complex queries can be used to locate data elements. DHTs and other structured overlays typically guarantee completeness, whereas unstructured systems, such as Gnutella and Freenet, do not provide such guarantees. As an inherent limitation, structured systems support less complex queries, typically the lookup of keys. Unstructured systems, on the other hand, can support complex query processing. In this book, we cover both structured and unstructured systems and highlight their key properties. Key- based routing Key-based range queries Attribute- based queries Expressiveness Completeness No guarantees Guarantees Content- based routing DHT Hybrid Unstructured FIGURE 1.5 Balancing completeness and expressiveness in overlays. © 2010 Taylor and Francis Group, LLC
  • 26. 8 Overlay Networks: Toward Information Networking 1.3 Applications Many overlay networks have been proposed both in the research community and by Inter- netandWebcompanies.Overlaynetworkscanbecategorizedintothefollowingclasses[80]: • P2P file sharing: For sharing media and data. For example, Napster, Gnutella, KaZaA. • CDN: Content caching to reduce delay and cost. For example, Akamai and Lime- Light. • Routing and forwarding: Reduce routing delays and cost, resiliency, flexibility. For example, resilient overlay network (RON), Internet indirection infrastructure (i3). • Security: To enhance end-user security and offer privacy protection. For exam- ple, virtual private networks (VPNs), onion routing, anonymous content storage, censorship resistant overlays. • Experimental: Offer testing ground for experimenting with new technologies. For example, PlanetLab. • Other: Offer enhanced communications. For example, e-mail, VoIP, multicast, publish/subscribe, delay tolerant operation, etc. Currently a significant amount of content is being served using decentralized P2P over- lays. Most of the deployed algorithms are based on unstructured overlays. The unstruc- tured P2P protocol BitTorrent has become a popular content distribution protocol over the recent years. P2P technologies are not commonly used with CDNs; however, they are increasingly used by end clients. P2P offers end client–assisted data distribution, in which clients acting as peers upload data. This contrasts with the traditional client-server CDN model, in which clients do not upload data. The main strength of P2P is in the delivery of massively popular data items; however, items that fall into the long tail may not be cost-efficient to distribute using P2P. This can be alleviated by storing data items on client machines using caching, but this requirement is not favored by many users. 1.4 Properties of Data In this section, we briefly discuss the properties of data [117, 120, 228]. Data can be charac- terized in many ways. We consider an example taxonomy in Figure 1.6 that divides data into two parts: stored data and real-time data. Stored data consists of bits that are stored on a system on a more permanent basis in such a way that the data can be made available later. This data can take two forms: it can be mutable or immutable. Mutable data can be shared and modified by various entities either locally or in the distributed environment. Mutable data can be made incrementally available, and it canbecreatedandmanagedbymultipleentities.Ontheotherhand,mutabledataisnoteasy to cache and it requires complicated security solutions, especially in distributed environ- ments. Immutable data means that the full data—for example, a picture or a video file—is available, and it does not change. This data can therefore be cached and verified easily. Real-time data is generated on the fly and transmitted over the network. The data is pack- etized, possibly on multiple layers, and it is transferred hop-by-hop on a store-and-forward basis. This means that, although individual packets of the data are stored in intermediate © 2010 Taylor and Francis Group, LLC
  • 27. Introduction 9 Data Stored data Mutable data Immutable data Real-time data Streaming Signaling Data sharing Data only incrementally available Data only incrementally available Secure operation needs solutions Not easy to cache Data incrementally available Full data is available Easy to cache Cannot be cached Cannot be cached Easy to varify FIGURE 1.6 Taxonomy of data. buffers, the whole data is not stored as such. In addition, with real-time data, the time when the data is inserted into the network plays a crucial part. Streaming data is only incrementally available, and only the latest packets of this stream are important. This means that this kind of data cannot be cached. Another form of real- time data is signaling. In this case, data also becomes incrementally available and cannot be cached; however, the data packets are typically very different from streaming. References play an important part in distributed systems. A reference encapsulates a re- lationship between itself and a referent defined relative to the state of some physical system. As examples we may consider memory addresses that point to some specific locations of physical memory and universal resource locators (URLs) that point to Web resources located on specific servers, available using a specific protocol such as the hypertext transfer proto- col (HTTP). If the physical system changes—for example, memory is swapped or a server is relocated—the referent changes as well. These so-called physical references may become invalid when the environment changes. In order to cope with changes in the environment, the common practice is to introduce a level of indirection into the reference system. For example, the domain name system (DNS) binds host names to IP addresses, which allows administrators to change IP addresses without changing host names. The hierarchical and replicated structure of DNS scales well for its intended purposes, and it is at the core of the Internet. A data element can be either mutable or immutable. In the former case it can change, and in the latter case it cannot change. It is obvious that a mutable data element can be represented by a sequence (or a graph) of immutable data elements. Given that a piece of data does not change, it can be uniquely and succinctly summarized using a hash function. We note that hashes only provide probabilistic uniqueness; however, a long enough hash bitstring results in a vanishingly small probability of collision. A hash function is a function from a sequence of bytes to a fixed size sequence of bits, a bitstring.Hashfunctionscanbecharacterizedbasedonhoweasyitistofindacollision[227]: • A hash function is strongly collision resistant if it is not computationally feasible to find two different input data items which have the same hash. • A hash function is weakly collision resistant if, for a given data item, it is computa- tionally not feasible to find another data item that has the same hash. © 2010 Taylor and Francis Group, LLC
  • 28. 10 Overlay Networks: Toward Information Networking • A hash function is probabilistically collision resistant if, for a given input data item, the probability that a randomly chosen data item will have the same hash as the input data item is extremely small. Semantic-free references have been proposed to achieve persistence and freedom from contention in a naming system [20, 339]. The idea is to use a reference namespace devoid of explicit semantics—for example, based on hashed identifiers. This means that a reference should not contain information about the organization, administrative domain, or network provider. Flat semantic-free references contrast with DNS-based URLs because they have no explicit structure. The semantic-free referencing method uses DHTs to map each object reference to a machine that contains object metadata. The metadata typically includes the object’s current network location and other information. Until recently, there have been no good candidate solutions for resolving semantic-free names in a scalable fashion in the distributed environment. The traditional solution has been to use a partitioned set of context-specific name resolvers. The emerging overlay DHT technology can be used to efficiently store and look up semantic-free references. Indeed, the so-called self-certified flat labels have gained widespread adoption in recent overlay systems. Self-certifying data is data whose integrity can be verified by the client accessing it [227]. A node inserting a file in the network or sending a packet calculates a cryptographic hash of the content using a known hash function. This hashing produces a file key that is included in the data. The node may also sign the hash value with its private key and include its public key with the data. This additional process allows other nodes to authenticate the original source of the data. When a node retrieves the data using the hash of the data as the key, it calculates the same hash function to verify that the data integrity has not been compromised. A large part of the research and development on P2P systems has focused on data- centric operation, which emphasizes the properties of the data instead of the location of the data. Ideally, the clients of the distributed system are not interested in where a par- ticular data item is obtained as long as the data is correct. The notion of data-centricity allows the implementation of various dynamic data discovery, routing, and forwarding mechanisms [274]. In content-based routing systems, hosts subscribe to content by specifying filters on mes- sages. In content-based routing, the content of messages defines their ultimate destination in the distributed system. Information subscribers use an interest registration facility provided by the network to set up and tear down data delivery paths. Data-centric and content-based communications are currently being investigated as possible candidates for Internet-wide communications. 1.5 Structure of the Book After the introduction chapter that motivates overlay technology and outlines several ap- plication scenarios, we start with an overview of networking technology in Chapter 2. This chapter briefly examines the TCP/IP protocol suite and the basics of networking, such as naming, addressing, routing, and multicast. The chapter forms the basis for the follow- ing chapters, because typically TCP/IP is the underlay of the overlay networks and thus © 2010 Taylor and Francis Group, LLC
  • 29. Introduction 11 understanding its features and properties is vital to the development of efficient overlay solutions. We discuss properties of networks in Chapter 3, including the growth of the Internet, trends in networking, and how data can be modeled. Many of the overlay algorithms are based on the observation that networks exhibit power law degree distributions. This can then be used to create better routing algorithms. In Chapter 4 we examine a number of unstructured P2P overlay networks. Many of these solutions can be seen to be part of the first generation of P2P and overlay networks; however, they can be also combined with structured approaches to form hybrid solutions. We cover protocols such as Gnutella, BitTorrent, and Freenet and present a comparison of them. This chapter places special emphasis on BitTorrent, because it has become the most frequently used P2P protocol. Chapter 5 presents the foundations of structured overlays. We consider various geome- tries and their properties that have been used to create DHTs. The chapter also presents consistent hashing, which is the basis for the scalability of many DHTs. After surveying the foundations and basic cluster-based solutions, we then examine a number of structured algorithms in Chapter 6. Structured overlay technologies place more assumptions on the way nodes are organized in the distributed environment. We analyze algorithms such as the Plaxton’s algorithm, Chord, Pastry, Tapestry, Kademlia, CAN, Viceroy, Skip Graphs, and others. The algorithms are based on differing structures, such as hypercubes, rings, tori, butterflies, and skip graphs. The chapter considers also some advanced issues, such as adding hierarchy to overlays. Many P2P protocols and overlay networks utilize probabilistic techniques to reduce processing and networking costs. Chapter 7 presents a number of frequently used and useful probabilistic techniques. Bloom filters and their variants are of prime importance, and they are heavily used in various network solutions. The chapter also examines epi- demic algorithms and gossiping, which are also the foundation of a number of overlay solutions. As observed in this chapter, data-centric and content-centric operation offer new possi- bilities regarding data caching, replication, and location. Recently, content-based routing has become an active research area. In Chapter 8 we consider content-centric routing and examine a number of protocols and algorithms. Special emphasis is placed on distributed publish/subscribe, in which content is targeted to active subscribers. Given the scalable and flexible distribution solutions enabled by P2P and overlay tech- nologies, we are faced with the question of security risks. The authenticity of data and content needs to be ensured. Required levels of anonymity, availability, and access con- trol also must be taken into account. Chapter 9 examines the security challenges of P2P and overlay technologies, and then outlines a number of solutions to mitigate the ex- amined risks. Issues pertaining to identity, trust, reputation, and incentives need to be analyzed. Chapter 10 considers applications of overlay technology. Amazon’s Dynamo is consid- ered as an example of an overlay system used in production environment that combines a number of advanced distributed computing techniques. We also consider video-on-demand (VoD) in this chapter. Much of the expected IP traffic increase in the coming years will come from the delivery of video data in various forms. Video delivery on the Internet will see a huge increase, and the volume of video delivery in 2013 is expected to be 700 times the capacity of the US Internet backbone in 2000. The remainder of the chapter examines P2P SIP for telecommunications signaling, and content distribution technologies. Finally, we conclude in Chapter 11 and summarize the current state of the art in overlay technology and the future trends. The chapter outlines the main usage cases for P2P and overlay technologies for applications and services. © 2010 Taylor and Francis Group, LLC
  • 30. © 2010 Taylor and Francis Group, LLC
  • 31. 2 Network Technologies This chapter examines the TCP/IP protocol suite and the basics of networking, such as naming, addressing, routing, and multicast. The chapter forms the basis for the follow- ing chapters, because typically TCP/IP is the underlay of the overlay networks and thus understanding its features and properties is vital for the development of efficient overlay solutions. The chapter places emphasis on interdomain routing, because it is key for scal- able and policy-compliant global networking. Overlay solutions should ensure that the underlay is used in an efficient and policy-compliant manner [203]. 2.1 Networking TCP/IP forms the basis of the current Internet, and it is generally described as having four abstraction layers—namely, the link layer, network layer, transport layer, and application layer. This layered view is often compared with the seven-layer OSI reference model. Design principles, outlined in RFC 1122, have had a major influence on the development of the current Internet [106]. The two key design principles for the Internet were [81] the end-to-end principle and the robustness principle. The end-to-end principle places the maintenance of state and overall intelligence at the edges and assumes the core Internet retains no state [282]. Today’s real-world needs for firewalls, network address translation (NAT), and web content caches have essentially made this principle impossible to follow in practice. The robustness principle can be summarized as follows: be conservative in what you do, be liberal in what you accept from others. The principle suggests that Internet software developers carefully write software that adheres closely to extant RFCs but accept and parse input from clients that might not be consistent with those RFCs. As stated in RFC 1122, adaptability to change must be designed into all levels of Internet host software. The network layer in the TCP/IP model is responsible for realizing internetworking and uses the IP protocol to deliver data from upper layers between end hosts. The protocol suite separates host names from topological addresses by using name resolution. The domain name system (DNS) is responsible for resolving hierarchical host names to topological IP addresses [231]. This effectively separates naming from addressing, and even though the naming system, namely DNS, fails, the underlying routing can still function independently. DNS also allows the definition of organizational boundaries that are independent of the network topology. A routing algorithm is responsible for building and maintaining routing tables. A forward- ing algorithm is responsible for determining the next hop given a destination address. Packet routinginvolvesuseofroutingandforwardingalgorithmsandprotocolsfordecidingwhere an incoming packet should be sent. The two main classes are intradomain and interdomain protocols. Intradomain protocols are applied in an autonomous system (AS)—for example, 13 © 2010 Taylor and Francis Group, LLC
  • 32. 14 Overlay Networks: Toward Information Networking a metropolitan area network (MAN) or regional network—and interdomain protocols are used to connect the different AS together to form a global network topology. The typical exam- ples of the protocols are open shortest path first (OSPF) for intradomain operation and border gateway protocol (BGP) for interdomain operation. The communications models offered by the Internet can be categorized into the following cases. In unicasting, a packet traverses a sequence of links from a source to a destination. The majority of traffic on the Internet is unicast. In multicasting, a packet selectively traverses multiple chains of links from typically one source to multiple destinations. In broadcasting, a packet is sent on multiple links to every device on the network. In practice, broadcast is applied only within a specific broadcast domain. In anycasting, a suitable chain of links is selected from a number of possible candidates. Packets are sent to the nearest or best destination. Of the above communication models, the currently dominant IP version 4 protocol sup- ports only unicasting on a global scale. The next version of IP, version 6, offers these other communication models as well; however, the IPv6 deployment has not progressed accord- ing to some optimistic expectations, and it remains to be seen when the new protocol is globally deployed. The Internet is based on hierarchical routing, in which autonomous areas (AS) are connected by peering and transit links. Each AS can run its own local routing algorithm, and BGP is used for interdomain connectivity. Figure 2.1 illustrates the interoperable nature of the IP protocol. The network layer pro- vides global addressing and end-to-end reachability, and thus abstracts the applications from the details of routing and forwarding. The IP protocol supports a number of under- lying links and physical layer protocols, which makes it the waist of the protocol stack. Higher-level features diverge from the IP and support different operating environments. The network layer therefore minimizes the number of service interfaces and maximizes interoperability. Divergence Convergence Diverse physical layers Diverse applications Transport layer (TCP/IP) FIGURE 2.1 Hourglass model in networking. © 2010 Taylor and Francis Group, LLC
  • 33. Network Technologies 15 Middleware provides additional services on top of the networking stack and below the applications. Most overlay and P2P technologies can be thought to be part of middleware. As middleware, they utilize the APIs and features of the underlying protocol stack and network and offer their own APIs for application developers. The motivation for this layer is that it can abstract many details pertaining to the underlying layers and thus make it easier to develop and run distributed software. TCP/IP applications use either a host name or an IP address. The former requires a DNS lookup to resolve the IP address, whereas the latter is directly routable. Recently there have been a number of proposals for adding further indirection into the protocol architecture by means of locator-identity split. In general, the split would allow various identifiers— for example, cryptographic identifiers [14, 188, 243]—to be mapped to IP addresses. The motivation for locator-identity split is increased flexibility and de-emphasizing the central role of IP addresses as end-point identifiers. 2.2 Firewalls and NATs The present-day Internet has seen ubiquitous deployment of firewalls and network address translators (NATs). Both are used to control data communications between subnetworks. Firewalls are hardware or software components that block certain incoming connections. Their main motivation is to increase security by preventing unauthorized connections to a device. NAT devices, on the other hand, perform conversion between different address spaces, typically private and public networks (Fig. 2.2). The motivation for NATs is that they offeracertainlevelofsecurityandallowtheuseofprivateIPaddressspaces,thusalleviating IP address exhaustion concerns and some network management concerns as well. A NAT involves the translation of an IP address used within one network to a different IP address known within another network. Typically, a NAT maps its local private network addresses to one or more global outside IP addresses and then performs reverse mapping of the global IP addresses on incoming packets back into private IP addresses. Private address A Private address B NAT with public address Inside local IP addr. A B Out IP addr. Public IP Public IP Inside port 1000 1001 Out port 2000 2001 FIGURE 2.2 Example of network address translation. © 2010 Taylor and Francis Group, LLC
  • 34. 16 Overlay Networks: Toward Information Networking There are a variety of NAT devices and a variety of network topologies utilizing NAT devices in deployments. NAT devices support private IP addressing domains that are not globally reachable. Typically, client-initiated connections create soft state in the NAT devices so that responses to requests can be sent to the hosts in the private domain. There are four general types of NAT devices, based on how they perform the address mapping: • Full cone NAT maps an internal address to an external address in one-to-one fash- ion, and it is easy to traverse. • Restricted cone NAT maps internal address (and port) to an external address. Once the internal client has sent a packet to an external host, the external host can send packets back from any port. • Port-restricted cone NAT is similar to the restricted cone NAT, but the external host can only send from the port to which it received packets from the internal client. • In symmetric NAT, only an external host that receives packets from the internal host can send packets back. The asymmetric addressing and connectivity domains established by NAT devices have created unique problems for P2P systems, which realize both client and server functionality at end nodes. NATs may prevent P2P nodes from receiving inbound requests. Although P2P systems build on the end-to-end communications capability of the Internet, in practice the assumption that a peer can receive inbound traffic is often not valid. A number of techniques have been devised for applications to detect the NATs on the communication path and then configure the communications in such a way that the con- nection can be established. The communication options depend on the type of NATs. The worst case happens when there are symmetric NATs present, which map each out- going connection to a new IP address and port number. This case is solved by using a special rendezvous server that relays all packets between the communicating end- points [302]. IETF has developed a number of NAT traversal solutions that include connection estab- lishment (STUN), relaying (TURN), and combined solutions for SIP (ICE). The solutions are surveyed and discussed in RFC 5128 [302]. Relaying is the most reliable method of realizing NAT traversal; however, it is also the least efficient, because the relay server’s processing power and network capacity is used to relay packets. Another technique is connection re- versal for direct communication that works if only one of the two peers is behind a NAT device. UDP and TCP hole punching can be used to punch holes through NAT devices and establish direct connectivity between peers even when both hosts are behind NATs. Recent analysis results indicate that UDP hole punching works widely on more than 80% of the NAT devices. TCP hole punching is not as frequently supported, with approximately 60% support. P2P applications may use multiple rendezvous servers for registration, discovery, and relay functions. As an example, Skype uses a central public server for login and a num- ber of different public servers to realize end-to-end relay functionality. Recent studies based on thousands of BitTorrent swarms indicate that roughly half of the peers can be behind firewalls [232]. We return to the Skype and BitTorrent protocols in more detail in Chapter 4. © 2010 Taylor and Francis Group, LLC
  • 35. Network Technologies 17 2.3 Naming Names and namespaces are fundamental components of network architectures. In the cur- rent Internet, the DNS is responsible for managing the hierarchical domain namespace. The DNS protocol was specified in the early 1980s by the IETF. Much of the flexibility of the current Internet stems from the scalability of both network-level hierarchical routing and the higher level naming service. DNS has facilitated the deployment of the World Wide Web and e-mail. DNS is a managed distributed overlay that uses a static distribution tree and a hierarchi- cally organized namespace. The DNS system is a distributed database system implemented using the client-server model, in which the nameservers are responsible for the sharing, replicating, and partitioning the domain namespace, and answering client requests. DNS achieves scalability and resilience by relying extensively on caching and replication. As a consequence, updates to DNS records typically require some time to become globally available. Another limitation of DNS is that it does not have built-in security, which makes it prone to a number of vulnerabilities. The client-side uses a DNS resolver to look up information from DNS. DNS uses UDP for typical requests and TCP for larger transfers. The DNS system supports two different query modes, namely nonrecursive queries and recursive queries. A nonrecursive query places the control at the requesting client, and typically a single DNS provides only a partial answer to the query. The client can then expand the partial answer by using other nameservers that are identified in the partial answer. A recursive query, on the other hand, places the control of the resolution process at the nameserver, which will then contact other nameservers to answer the query. This latter mode is not a mandatory feature. The namespace consists of domain names that are organized in a tree structure. Each domain name in this tree has zero or more resource records that contain information about the name. Each domain name is part of a DNS zone and has one or more authoritative DNS servers. The root level of the hierarchy is served by the root nameservers, which are used to look up a top-level domain name (TLD). A DNS zone consists of a set of nodes served by an authoritative nameserver. Adminis- trative responsibility of a zone can be divided to multiple nameservers. Moreover, a single nameserver can be responsible for multiple zones. Authority can be delegated for an arbi- trary part of a zone, typically in the form of subdomains. In the case of delegation, the new nameserver will become the authoritative nameserver for the delegated namespace. The Internet Corporation for Assigned Names and Numbers (ICANN) oversees the reg- istrar companies that maintain top-level domains. The domain names have a hierarchical structure, and new hierarchy levels can be assigned under the top-level domains. The DNS domain hierarchy is independent of network topology and network administrative do- mains. This means that multiple names can be mapped to the same network and same physical server. A name can also map to different IP addresses based on some policy, which is useful in realizing load balancing. The separation of naming and addressing thus provides flexibility by allowing more fine-grained policies to be implemented. The DNS service has been designed to accept queries pertaining to host names and IP addresses. A DNS client can perform a lookup to translate a hostname to an IP address, translate an IP address to a hostname, and obtain published information about a host (typically MX record for e-mail SMTP server details). Figure 2.3 illustrates how DNS is used. When a client needs to obtain information about a hostname, it sends a query to its local DNS server. The local DNS server consults its own © 2010 Taylor and Francis Group, LLC
  • 36. 18 Overlay Networks: Toward Information Networking DNS name server DNS client (resolver) root … fi com uk tkk cse helsinki Root name server fi name server tkk.fi name server cse.tkk.fi name server 6. Answer 1. Resolve host.cse.tkk.fi using recursive query 2. Query 3. Referral 4. Referral 5. Query FIGURE 2.3 Overview of the domain name system. cache if it already has the answer to the query. If the cache does not contain the answer, the local DNS server forwards the query to other DNS servers. Once the DNS server receives an answer, it can cache it before sending it to the client. We can take the lookup for cse.tkk.fi as an example. The local DNS server first queries one of the public root nameservers to find the machines that are nameservers for the .fi domain. Then the local DNS server queries the .fi domain nameservers to determine the nameservers responsible for the tkk.fi domain. Finally, it queries the tkk.fi for the host or Web server IP address. There are two main types of DNS activities: lookups and zone transfers. Lookups happen when a DNS client, or a DNS server acting on behalf of a client, queries a DNS server for information. Typically lookups involve finding the IP address for a given hostname, the hostname for a given IP address, the name server responsible for a given domain, or the mail server for a given host. Zone transfers happen when a DNS server requests all records pertaining to a part of the DNS naming hierarchy (the zone) from another DNS server. The requesting DNS server is called the secondary server and the serving one is the primary server. Zone transfers are expected to happen only among servers that should be replicated. Since DNS knows the details of how a network is structured (the names and IP addresses), this information may need to be protected. 2.4 Addressing The Internet is based on hierarchical routing, which is reflected in its addressing system. The network addresses are divided into two parts, namely the network and host parts. The former defines the part of the network topology responsible for that address space, and © 2010 Taylor and Francis Group, LLC
  • 37. Network Technologies 19 the latter part defines the host. IPv4 has 32-bit addresses and the newer IPv6 extends this to 128 bits, which is expected to be sufficient for current needs. In both IPv4 and IPv6 the addressing space is divided into variable size prefixes. Originally, there were three prefix classes of A, B, and C corresponding to 8, 16, and 24 bits forthenetworkpartinanaddress.Thelimitationofthismodelwasthateachprefixappeared withhostaddressesincludedinglobalroutingtables,resultinginscalabilitychallenges.Asa result of a growth crisis, the classless interdomain routing (CIDR) was designed and deployed. CIDR supports provider aggregated addresses by allowing variable length network part in an address. This allows better utilization of the existing address spaces, especially class B networks and aggregate routing table entries. CIDR has significantly reduced the global routing tables, and it is used in IPv4 and IPv6 [1]. 2.5 Routing In this section, we briefly outline the basic routing process and then examine interdomain routing. We briefly present the border gateway protocol (BGP), examine some of the current challenges for BGP, and finally consider compact routing, which is a family of routing schemes that aim for scalability. 2.5.1 Overview Routing in a static network is straightforward, having each router determine directions for each possible destination. Routing in dynamic networks is more challenging, because the routing tables change and routing instructions need to be computed at runtime. The key question is where is the state and how often does it need to be updated? The common approach is to broadcast routing state to all routers, which is exemplified in link-state routing protocols that broadcast link-state updates that are used to compute shortest path distances. To avoid excessive flooding of link-state updates, the common solution is to divide the network into routing domains and use this hierarchy to limit the propagation of link-state updates. Areas are extensively used in the OSPF, in which they are a network-dimensioning instrument. Hierarchies naturally occur in the interdomain context, in which autonomous systems reflect administrative boundaries. Aroutingprocessisresponsibleforcomputingtheforwardingtableofanode.Therouting process estimates the costs of incident links and communicates with its neighbors via these links. A routing algorithm is the mechanism that defines what information is exchanged with neighbors and how the forwarding tables are computed. The central purpose of a routing algorithm is to maintain a forwarding configuration in which nodes are mutually reachable by forwarding. It is often also desirable for the paths taken by forwarded packets to be optimal or near-optimal [197]. TheInternetisbasedonhierarchicalrouting.TheseminalworkbyKleinrockandKamoun published in 1977 showed how hierarchical clustering can be used to produce scalable routing tables [187]. The key idea is to cluster nearby nodes together and then combine clusters into superclusters, and continue this in a bottom-up hierarchical manner. As a result, unnecessary topological information gets abstracted from the routing tables, and the network scales well. Hierarchical routing results in routing table sizes on the order of √ n. Hierarchical routing is used today by a variety of protocols in both interdomain (BGP, CIDR) and intradomain routing (OSPF). © 2010 Taylor and Francis Group, LLC
  • 38. 20 Overlay Networks: Toward Information Networking 2.5.2 Interdomain The interdomain structure has resulted from developments in both technology and business models. It is a mixture of technological advances and business decisions driven by investors and the stock market. A current trend has been toward massively popular content services on the Internet. This has created pressure toward better network support of data delivery and dissemination. The need to be able to deliver vast amounts of data in an efficient and low-cost manner has given birth to CDNs and various peer-to-peer networks, such as BitTorrent networks. CDNs charge for the data delivery service and are typically based on proprietary, closed solutions. BitTorrent and peer-to-peer networks, however, rely on peer-assisted data ex- change. The latter rely on low-cost, mostly flat rate, connections between end-users and their providers. This new network behavior has resulted in various anti-peer-to-peer mea- suresbyInternetproviderspartlyduetothefactthatmanyP2Pprotocols,suchasBitTorrent, do not take interdomain policies into account and thus are not ISP friendly. The core Internet architecture was not designed to serve as critical communication in- frastructure for society. Therefore, the economical and political context must also be ana- lyzed and understood. The current question is whether viable economic models exist for Internet service provision. Business modeling is complicated by regulatory background, which varies by country. Telephone-carrier-based ISPs have been asking regulators for the ability to charge differentially, based on the application and content of traffic. This kind of discriminatory pricing may pose fundamental limitations for end users and limit their freedom. Figure2.4illustratesinterdomainroutingwithanumberofautonomoussystems.Overlay networks are implemented on top of the network layer topology as illustrated in the figure. Thecurrentinterdomainpracticeisbasedonthreetiers,namelytiers1,2,and3.Tier-1isanIP network that connects to the entire Internet using settlement-free peering. There are a small number of tier-1 networks that typically seek to protect their tier-1 status. A tier-2 network is a network that peers with some networks but relies on tier-1 for some connectivity, for which it pays settlements. A tier-3 network is a network that only purchases transit from other networks. A C B A5 B1 A1 A2 A3 A4 B4 B3 C1 C3 C4 C2 B2 AS10 transit Stub AS20 Stub AS30 Stub AS40 Overlay node Regular node FIGURE 2.4 Example of interdomain routing. © 2010 Taylor and Francis Group, LLC
  • 39. Network Technologies 21 The three main AS categories are as follows [143]: customer-to-provider (C2P), peer- to-peer (P2P), and sibling-to-sibling (S2S). In the C2P, a customer AS pays a provider AS for any traffic sent between the two. In the P2P category, two domains can freely exchange traffic between themselves and their customers but do not exchange traffic from or to their providers or other peers. In the S2S category, two domains are part of the same organization and can freely exchange traffic between their providers, customers, peers, or other siblings. Gao’s work formulated the AS relationships inference problem. Gao assumed that every BGP path must comply with the following hierarchical pattern: an uphill segment of zero or more C2P or S2S links, followed by zero or one P2P links, followed by a downhill segment of zero or more P2C or S2S links. Paths with this hierarchical structure are valley-free or valid. Paths that do not follow this hierarchical structure are called invalid and may result from BGP misconfiguration or from BGP policies that are more complex and do not distinctly fall into the C2P/P2P/S2S classification [143]. According to recent measurements, BGP tables miss up to 86.2% of the true AS adjacencies. The majority of these links are of the P2P type. This means that peering links are likely to be more dominant than have been previously reported or conjectured. 2.5.3 Border Gateway Protocol The border gateway protocol (BGP) is responsible for connecting the different autonomous systems together, and it is the key protocol for building and maintaining the global routing table at interdomain routers. The current version of BGP is 4, and it incorporates support for CIDR and route aggregation to improve scalability (RFC 4271). BGP is realized as a manually configured overlay network that uses TCP connections between peers. Routing updates propagate from peer-to-peer, and after receiving updates a BGP router updates its interdomain routing table based on the new information (the received path vectors). BGP keeps a table of IP networks that are reachable either through peering links or transit links. Each IP address, or prefix, is associated with a vector of AS numbers that indicates the ASes that need to be traversed to reach the destination prefix. BGP is described as a path vector protocol, since it is built on this notion of a vector of AS identifiers. Moreover, BGP does not use intradomain metrics such as latency to make routing decisions; instead it uses network policies and rule sets to decide what paths are used in routing and forwarding. 2.5.4 Current Challenges As a central component of the Internet, BGP is at the heart of the network and thus faces increasing scalability challenges as the global network grows. BGP scalability concerns stem from the observation that each interdomain router is expected to maintain routing paths to all valid network prefixes. Currently, there are almost 3 × 105 prefixes [1] in the global routing table, and this number is expected to grow in the near future because of site multihoming and provider-independent addressing. In addition to the space requirements, routing table updates poses several challenges. One is the frequency in which changes are propagated in the global backbone. Another concern is routing update oscillation that may result from router misconfiguration. One way to alleviate BGP scalability concerns is to separate path selection from packet forwarding. This is exemplified in the NIRA (a new interdomain routing architecture) proposal that empowers users with the ability to choose a provider and domain level end-to-end © 2010 Taylor and Francis Group, LLC
  • 40. 22 Overlay Networks: Toward Information Networking path [354]. The motivation for this is that only users know when a path works or not. This model creates competition between paths that different ISPs offer, because users can choose the most suitable paths. In this model, the network comprises three parts for each sender and receiver—namely, the core region (tier-1), the uphill region that covers all possible paths from the sender to the core, and the downhill region covering all possible paths from the core down to the receiver. Each region can have its own routing protocols. Another recent proposal, the accountable internet protocol (AIP), replaces the subnet prefix in IP packets with a self-certifying autonomous system identifier and a suffix that is a self- certifying host identifier [9]. The key idea is to support domain-level routing instead of the current prefix-based routing. The motivation is that there are fewer autonomous systems than network prefixes. The proposal also combines domain-level routing with security by using self-certified identifiers that make it easier to make network entities accountable. The host identifiers are expected to be unique, which would support host mobility and multi-homing in a seamless way. 2.5.5 Compact Routing As mentioned above, BGP faces significant scalability challenges, and recent measurements indicate that both the size of routing tables and the communication cost are increasing exponentially [190]. Prefix optimization techniques, such as CIDR, do not appear to be the most efficient solutions in the long run since they offer only a constant reduction in routing table sizes and they do not change the scaling behaviour of the network. Compact routing has been proposed as a candidate solution for decreasing routing table sizes and improving network scalability. A routing scheme is said to be compact when it results in logarithmic address and header sizes, sublinear routing table sizes, and a stretch bounded by a constant. The compact routing schemes can be divided into two categories, specialized and universal. The former works only on some specific graphs, and the latter works on all graphs. It has been shown that the classic link state, distance vector, and path vector routing algorithms exhibit routing table sizes on the order of n log(n) [144] with stretch-1 (the worst-case path length versus the shortest path). Moreover, hierarchical routing performs well only for graphs where large distances between nodes dominate. A universal stretch-1 compact routing algorithm has also (n log(n)) [144]. One interpretation of this result is that shortest-path routing is incompressible, and to obtain smaller routing tables the stretch must be allowed to increase above 1. The Cowen and the Thorup-Zwick are two well-known nonhierarchical stretch-3 compact routing schemes. These name-dependent schemes utilize a set of landmarks to constrain updates and keep routing table sizes minimal. A routing table consists of entries for the shortest paths to all landmarks and nodes in the local cluster [144]. 2.6 Multicast Unicast is the dominant communication model for Internet applications. Multicast is the process of sending data from typically one sender to multiple receivers. This typically involves the creation of a multicast tree that is either source specific or shared by the communicating entities. In general, the creation of an optimal multicast tree is equivalent to the Steiner tree problem that is known to be NP complete. This problem bears semblance to the minimum spanning tree problem; however, it considers only how to reach a specific subset of the nodes [348]. © 2010 Taylor and Francis Group, LLC
  • 41. Network Technologies 23 The multicast function can be implemented in the network level or it can be implemented intheapplicationlayer.Network-levelmulticastcomplementsunicastasabasicnetworking primitive. Application-layer multicast, on the other hand, typically utilizes unicast. In this section, we first examine IP multicast and then consider overlay multicast techniques. 2.6.1 Network-layer Multicast Multicast is essentially a one-to-many data delivery mechanism. Network-layer (or IP) mul- ticast provides the multicast capability in the form of special multicast address ranges that are used by network routers to connect senders and receivers. Multicast differs significantly from unicast in that it decouples the senders and receivers. Moreover, since there may be a number of receivers for a multicast data packet, the network can optimize the transmission by replicating packets at the last possible moment in the network. IP multicast is a simple, scalable, and efficient mechanism to realize simple group- based communication. IP multicast routes IP packets from one sender to multiple receivers. Participants join and leave the group by sending a packet using the IGMP (RFC 1112) protocol to a well-known group multicast address. The key components of IP multicast are • IP multicast group address • A multicast distribution tree maintained by routers • Receiver driven tree creation In order to receive multicast packets, receivers join a specific IP multicast group. A mul- ticast distribution tree is constructed and maintained by routers for the group. All packets sent to the multicast IP address are then delivered by the multicast protocol to all receivers that have joined the group. A multicast protocol is responsible for maintaining multicast trees that connect the mem- bers of multicast groups. There are two main categories of multicast algorithms, namely source-based trees and shared trees. The former is rooted at the router serving the source of multicast packets. This means that a tree is needed for each source; however, the trees can be optimal in terms of some metric. The latter is rooted at a specific router, called a rendezvous point (RP) or a core, that is responsible for maintaining the tree. In this case, the source sends data packets to the RP, which then is responsible for disseminating the data using the tree. The RP can then perform pruning operation to the tree to optimize the traffic. Internet group management protocol (IGMP) is a protocol designed to allow the manage- ment of IP multicast groups memberships. IGMP is used by IP hosts and adjacent multi- cast routers to establish and maintain multicast groups. According to RFC 3171, addresses 224.0.0.0 to 239.255.255.255 are designated as multicast addresses. IGMP is based on UDP thatisthecommonlow-levelprotocolformulticastaddressing.IPmulticast,asIPingeneral, is not reliable, and messages may be lost or delivered out of sequence. There are many different IP multicast protocols. The protocol-independent multicast (PIM) is a frequently used protocol that supports several different operating modes, namely sparse mode, dense mode, source-specific mode, and bidirectional mode. Several reliable multicast proto- cols have been developed—for example, the pragmatic general multicast (PGM) that extends IP multicast with loss detection and retransmission. IP multicast groups are not very expressive. They partition the IP datagram address- space, and each datagram belongs at most to one group. Moreover, IP multicast is a best- effort unreliable service, and for many applications a reliable transport service is needed. © 2010 Taylor and Francis Group, LLC
  • 42. 24 Overlay Networks: Toward Information Networking Multicast works well in closed networks; however, in large public networks multicast or broadcast may not be practical. In these environments universally adopted standards such as TCP/IP and HTTP may be better choices for all communication [168]. 2.6.2 Application-layer Multicast Given that IPv4 is still the prevailing network layer protocol and that it does not offer a native multicast mechanism, it is common to implement multicast on top of the TCP/IP protocol stack in the form of application-layer (or overlay) multicast. IP multicast requires routers to maintain per-group state or per-source state for each multicast group. A routing table entry is needed for each unique multicast group address, and the multicast addresses are not easily aggregated. Moreover, IP multicast still requires additional reliability and congestion control solutions. Therefore, there is motivation for developing and deploying overlay multicast solutions. Indeed, many of the systems discussed later in this book are examples of these. In this section, we briefly outline the key motivation for application-layer multicast and the dif- ferences to network-layer multicast. An application-layer multicast system typically uses unicast communication be- tween nodes to realize one-to-many communications. Data packets are replicated by the end hosts. These protocols may not be as efficient as IP multicast, because data may be sent multiple times over the same link. As an example, in a previous version of the Gnutella P2P protocol, one link was observed to be utilized six times for the same data [273]. This means that nodes establish communications either using UDP or TCP and forward messages using these links. The multicast tree construction algorithm is typically distributed and can take various metrics into account. Figure 2.5 compares IP multicast and overlay multicast in the following categories: de- ployment, structure, transport, scalability, congestion control, and efficiency [174]. In terms of deployment, IP multicast requires multicast-capable routers, whereas overlay multicast TCP or UDP UDP Transport layer protocol High (depends on solution) Limited Scalability Various, can utilize unicast (TCP) for node-to-node reliability No Congestion control/recovery Low (varies), can suffer from high stretch and unoptimal interdomain routing High Efficiency BitTorrent variants, Scribe, SplitStream, OverCast, etc. Protocol-independent multicast (PIM), Core-based trees (CBT), etc. Example protocols Typically a tree, both interior nodes of the structure and leaves are hosts Tree, interior nodes are routers, leaves are hosts Multicast structure Deployed over the Internet Multicast-capable routers Deployment Overlay Multicast IP Multicast FIGURE 2.5 Comparison of IP and overlay multicast. © 2010 Taylor and Francis Group, LLC
  • 43. Network Technologies 25 is based on hosts and can thus be deployed easily over the Internet. Both approaches are based on trees, with the difference being that in IP multicast hosts do not participate in the tree other than as leaves. As mentioned, IP multicast is not widely deployed and hence its scalability is limited. It is, however, efficient, whereas overlay solutions may not utilize optimal paths and may incur more overhead. 2.6.3 Chaining TCP Connections for Multicast Intuition suggests that overlay multicast typically incurs a performance penalty over IP multicast because of factors such as link stress, stretch factor, and end host packet process- ing. For example, early versions of the Gnutella P2P protocol used TCP, but later versions replaced it with UDP for performance reasons. Chains of TCP connections can offer an opportunity to increase performance compared to direct unicast. This performance im- provement comes from finding an alternative overlay path whose narrowest hop in the chain (as perceived by TCP) is wider than the default path used by IP [192]. The expected TCP throughput as a function of the per-hop loss rates and RTTs can be modeled using the following equation derived in [247]: T = s rtt 2p 3 + 12 3p 8 p(1 + 32p2) ≈ √ 1.5 rtt √ p (2.1) This provides an estimate of the expected throughput T of a TCP connection in bytes/sec as a function of the packet size s, the measured round-trip time rtt, and the steady state loss event rate p. A given hop in a chain of TCP connections either has local network conditions that limit its rate to a value below that of the upstream connections or is already limited by the rate of the upstream connections. Following the methodology used in [361], the aggregate RTT is defined as the sum ofrtti along the path and the aggregate loss rate is defined as 1− 1 − pi (assuming uncorrelated losses). T ≈ √ 1.5 rtti √ 1 − (1 − pi ) (2.2) 2.7 Network Coordinates The latency of network communications is an important metric for choosing routes and peers on the network. This raises the question of how accurately latency can be predicted without prior communication. Recent network measurement systems indicate that latency prediction is feasible based on synthetic network coordinates [91, 101, 320, 349]. A network coordinate system might be used to select from among a number of replicated servers to request a file. Vivaldi is a distributed algorithm that assigns synthetic coordinates to Internet hosts. It uses the Euclidean distance between the coordinates of two hosts to pre- dict the network latency between them. In this system, each node computes its coordinates by simulating its position in a network of physical springs. The sys- tem does not require fixed infrastructure, and a new host can compute useful coordinates after obtaining latency information from some other hosts [101]. © 2010 Taylor and Francis Group, LLC
  • 44. 26 Overlay Networks: Toward Information Networking 2.7.1 Vivaldi Centralized Algorithm When formulated as a centralized algorithm, the input to Vivaldi is a matrix of real network latencies M, such that Mxy is the latency between x and y. The output is a set of coordinates. Finding the best coordinates is equivalent to minimizing the error (E) between predicted distances and the supplied distances. Vivaldi uses a simple squared error function: E = x y (Mxy − dist(x, y))2 , (2.3) where dist(x, y) is the standard Euclidean distance between coordinates of x and y. Vivaldi places a spring between each pair of nodes for which it knows the network latency, with the rest length set to that latency. The length of each spring is the distance between the current coordinates of the two nodes. The potential energy of a spring is proportional to the displacement from its rest length squared: this displacement is identical to the prediction error of the coordinates. Therefore, minimizing the potential energy of the spring system corresponds to minimizing the prediction error E. Vivaldi simulates the physical spring system by running the system through a series of small time steps. At each time step, the force on each node is calculated and the node moves in the direction of that force. The node moves a distance proportional to the applied force and the size of the time step. Each time a node moves it decreases the energy of the system; however, the energy of the system stored in the springs will typically never reach zero, since network latencies do not reflect a Euclidean space. Neither the spring relaxation nor some of the other solutions, such as the simplex algorithm, is guaranteed to find the global minimal solution. Simu- lating spring relaxation requires much less computation than more general optimization algorithms. 2.7.2 Vivaldi Distributed Algorithm In the distributed version of Vivaldi, each node simulates a piece of the overall spring system. A node maintains an estimate of its own current coordinates, starting at the origin. Whenever two nodes communicate, the two nodes measure the latency between them and exchange their current synthetic coordinates. In RPC-based systems, this measurement can be accomplished by timing the RPC; in a stream-oriented system, the receiver might echo a timestamp. Once a measurement is obtained, both nodes adjust their coordinates to reduce the mis- match between the measured latency and the coordinate distance. A node moves its coor- dinates toward a point p along the line between it and the other node. The point p is chosen to be the point that reduces the difference between the predicted and measured latency between the two nodes to zero. To avoid oscillation, a node moves its coordinates only a fraction δ toward p. A node initializes δ to 1.0 when it starts and reduces it each time it updates its coordinates. Vivaldi starts with a large δ to allow a node to move quickly toward good coordinates and ends up with a small δ to avoid oscillation. If two nodes have the same coordinates (the origin, for instance), they each choose a random direction in which to move. Algorithm 2.1 illustrates the update procedure. 2.7.3 Applications A modified chord DHT (presented in Chapter 5) uses network coordinates to efficiently build routing tables based on proximity so that lookups are likely to proceed to nearby © 2010 Taylor and Francis Group, LLC
  • 45. Network Technologies 27 Algorithm 2.1 Pseudocode for the Vivaldi update procedure Data: sc is the other host’s coordinates, sl is the one-way latency to that host, the initial value of δ is 1.0. Function: update(sc, sl) /* Unit vector toward other host */ Vector dir = sc − myc dir = dir / length(dir) /* Distance from springs rest position */ d = dist(sc, myc) − sl /* Displacement from rest position */ Vector x = dir ∗ d /* Reduce δ at each sample */ δ− = 0.025 /* Stop at 0.05 */ δ = max(0.05, δ) x = x ∗ δ /* Apply the force */ myc = myc + x nodes. A node receives a list of candidate nodes and selects the one that is closest in coordinate space as its routing table entry; coordinates allow the node to make this decision without probing each candidate. The modified chord utilizes coordinates when performing an iterative lookup. When a node n1 initiates a lookup and routes it through some node n2, n2 chooses a next hop that is close to n1 based on Vivaldi coordinates. In an iterative lookup, n1 sends an RPC to each intermediate node in the route, so proximity to n1 is more important than proximity to n2. 2.7.4 Triangle Inequality Violation For a network coordinate system to work, it needs to properly reflect the latencies between network hosts. When neighbour or peer selection is based on brute-force network measure- ments, the quality of the selection cannot be affected by triangle inequality violations (TIV); however, when the number of nodes grows, performing these brute-force measurements may not be feasible. Then it is preferable to use a delay measurement system such as net- work coordinates discussed above. The potential challenge in using these systems is the assumption on the delay space that the triangle equality holds [340]. Any three nodes on the Internet A, B, and C form a triangle ABC. Edge AC is considered to cause a triangle inequality violation if d(A, B) + d(B, C) d(A, C), where d(X, Y) is the measured delay between X and Y. The triangulation ratio of the violation caused by AC in triangle ABC is defined as d(A, C) = (d(A, B) + d(B, C)). It has been demonstrated that TIVs can cause significant errors in latency estimation based on network coordinate systems. As a potential remedy, a TIV alert mechanism has been proposed that identifies edges with severe TIVs [340]. © 2010 Taylor and Francis Group, LLC
  • 46. 28 Overlay Networks: Toward Information Networking 2.8 Network Metrics In this section, we examine metrics that characterize various properties of networks. Our focus is, in particular, on metrics that are useful in the design and deployment of overlay networks. We have already touched this issue when discussing routing. First, we briefly consider routing algorithm invariants, which are crucial for ensuring that the algorithms perform according to the specifications. These invariants and properties do not assess how well the paths perform that a routing algorithm maintains in a routing table. Therefore a number of metrics are needed to understand the quality of the paths, the state of the routers and nodes, and the properties of the network. We elaborate on the following metrics: shortest path, routing table size, path stretch, forwarding load, churn, and several other metrics. 2.8.1 Routing Algorithm Invariants The correctness and performance of a routing algorithm can be analyzed using a number of metrics. Typically it is expected that a routing algorithm satisfies certain invariant properties that must be satisfied at all times. The two key properties are safety and liveness. The former states that undesired effects do not occur; in other words, the algorithm works correctly. The latterstatesthatthealgorithmcontinuestoworkcorrectly—forexample,itavoidsdeadlocks and loops. These properties can typically be proven for a given routing algorithm under certain assumptions. Safety and liveness can also be specified in terms of soundness and completeness [197] for a routing configuration. A configuration is sound if it includes paths for all node pairs that are reachable (have a path) after the network becomes quiet. A degenerate form of this configuration is one in which all nodes are unreachable. Completeness is used to ensure that all paths in the network are included in the configuration. Together these properties say that all nodes are reachable through the routing and forwarding system; however, they do not determine how optimal the paths are. Therefore, additional metrics are needed to assess the quality of the paths. 2.8.2 Convergence Soundness and completeness (or safety and liveness) do not consider how quickly the routing algorithm works or converges when the network changes. They only ensure that from the viewpoint of the system invariants, the operation is correct. Indeed, convergence costandtimeisanimportantmetricfordifferentkindsofroutingsystems,includingoverlay algorithms. The dynamics of peers joining and leaving an overlay system is called churn, and it is an inherent property of P2P systems. Peer participation is highly dynamic. Typically, a large part of the active peers are stable and the remaining peers change quickly [312]. This means that P2P overlay networks must be designed in such a way that they tolerate high levels of churn. Indeed, many of the algorithms presented in Chapter 6 tolerate churn. 2.8.3 Shortest Path The goal of a routing algorithm is to find the shortest paths between two destinations, A and B, that are reachable through the network. In order to do this, we need to have a metric for calculating these shortest paths and then create routing tables that reflect the paths according to distance. OSPF is an example of an intradomain protocol that com- putes shortest paths using link state routing. On the other hand, BGP is an example of a © 2010 Taylor and Francis Group, LLC
  • 47. Network Technologies 29 policy-based routing that calculates shortest paths based on policies and AS hops instead of, say, delay. In general, the shortest path length between two nodes Aand B is the minimum number of edges needed to traverse to reach A from B. The average path length is the average of the shortest path lengths between any two nodes. The average path length is a metric of the number of hops to reach a node from a given source. 2.8.4 Routing Table Size and Stretch We can observe two conflicting goals in the design of routing algorithms, namely that the network paths used by a given router should be as short as possible and, at the same time, the routing table should be as small as possible. The two key metrics are the optimality of the paths and the size of the routing tables. The efficiency of a routing algorithm is measured in terms of its stretch factor— that is, the maximum ratio between the length (or delay) of a route computed by the algorithm and that of a shortest path (or delay) connecting the same pair of nodes [251]. Stretch signifies the degree of achieved performance in terms of the optimal choice. For overlay systems, there is an inherent overhead compared to IP routing with the benefit of deployability and scalability. The treatment for overlay multicast is a bit more challenging. Typically, the benchmark IP multicast tree would be assumed to consist of the optimal unicast paths. We can extend the notion of a stretch to a multicast overlay tree as follows. Stretch for a multicast overlay tree is the ratio of the number of network-layer hops (or delay) in the path from the sender to a receiver in the multicast overlay tree, and the number of hops (or delay) required by the shortest unicast path between these two nodes, averaged over all trees and paths. In addition to stretch, we have the routing table size as the other important metric. The routing table should hold only a fraction of nodes in the network, and the routing algorithm should not require global information about the nodes. For overlay networks, the aim is to support routing tables that have sublinear sizes to the number of nodes in the network (and the number of items in the network). The routing table data structure should also be efficiently realized using hardware and software. These two metrics are in conflict, and a routing algorithm needs to balance between the size of the routing table and the optimality of the paths. 2.8.5 Forwarding Load Another important metric is the forwarding load placed on routers in terms of packets, connections, and messages. For an IP router, forwarding load is measured in terms of incoming packets and the incurring per-packet delay. IP routers use hardware or software routing tables to look up destination interfaces for a packet given the packet’s destination prefix. If a router cannot handle all incoming packets, its queues will become full and it will start to drop packets. This congestion is then handled at the edge of the network, following the end-to-end principle, and congestion avoidance is implemented in transport layer protocols, exemplified by the TCP congestion control algorithm. © 2010 Taylor and Francis Group, LLC
  • 48. 30 Overlay Networks: Toward Information Networking Router forwarding load therefore is handled mostly at the edge for TCP/IP; however, overlay nodes are typically end hosts themselves, which makes stress an issue that has to be taken into consideration when designing an overlay algorithm. For an overlay node, for- warding load can be viewed to be the amount of traffic the node is processing at a particular time or time interval. This traffic has many components, namely control traffic pertaining to how the overlay network is structured (neighbors, super nodes, etc.) and the actual content. In a multicast system, forwarding load can be expressed in terms of the branching fac- tor (or replication factor) of each node. For overlay multicast systems, the load incurred from multicast forwarding compared to network level forwarding can be defined to be the number of identical packets sent by a node. For network layer multicast there is no redun- dant packet replication; however, an overlay multicast scheme may result in a number of unnecessary packet replications (called false positives). 2.8.6 Churn Churn is a metric that is especially pertinent for P2P overlay systems. Churn pertains to the rate of arrivals and departures in the system. Typically, a large part of the active peers are stable and the remaining peers come and go. P2P overlay networks must be designed in such a way that they tolerate high levels of churn. Recent analysis of churn indicates that, overall, its characteristics are remarkably similar across different systems [271, 312]. Churn is an inherent property of P2P systems and describes the dynamics of peer arrival and departure. High churn means that the system is highly dynamic, with peers coming and going. Two metrics have been commonly used for churn in file-sharing systems, namely a node’s session time and lifetime. The session time is the duration between the node joining the network and then subsequently leaving it. The lifetime is the time between when the node first entered the network and then left the network permanently. These two metrics are depicted by Figure 2.6. The availability of a node can be defined to be the sum of a node’s session times divided by its lifetime. In one study, it has been argued that the session times of nodes in a DHT are more relevant than their lifetimes [271]. 2.8.7 Other Metrics Other important metrics that characterize a network include: • Network diameter, which is the average minimum distance between any pair of nodes. • Node degree, which is the number of links that the node has to other nodes in an undirected graph. This degree distribution is connected with the robustness of the network to node failures. Time Join Lifetime Leave Join Leave FIGURE 2.6 Session time in Churn. © 2010 Taylor and Francis Group, LLC
  • 49. Network Technologies 31 • Locality-awareness and the properties of data, which are important for data lookup overlays and CDNs. • Policy compliancy, which is important for routing that takes place across organi- zation boundaries. BGP is the classic example of a policy-based routing protocol. © 2010 Taylor and Francis Group, LLC
  • 50. © 2010 Taylor and Francis Group, LLC
  • 51. 3 Properties of Networks and Data This chapter examines the salient properties of networks and data communicated over the networks. We start with a characterization of data on the current Internet and discuss the growth rate of the global network. Both geographical and logical distribution of data are crucial when creating overlay networks over the Internet that ensure efficient data availability. We discuss the role of power-laws and small-worlds in networking. In order to engineer efficient overlay systems, a lot of information is needed pertaining to the underlying network, the nodes and their characteristics, and the properties of the data that they subscribe, publish, and seek. This calls for various models, including mod- els of the actual traffic distributions on the Internet (including their spatial and temporal characteristics), models of host connectivity, models of the dynamics of churn, and so on. In this chapter, we outline some of the fundamental characteristics of overlay networks. 3.1 Data on the Internet We are currently in the era of the exabyte in terms of annual IP traffic [78] and entering the era of the zettabyte (1021 bytes). Cisco’s latest traffic forecast for 2009– 2013 indicates that annual global IP traffic will reach 667 exabytes in 2013 [79]. The traffic is expected to increase some 40% each annum. Much of this increase comes from the delivery of video data in various forms. Figure 3.1 presents Cisco’s forecast estimates for monthly global IP traffic until 2011. Ac- cording to these estimates, the Internet is growing fast. We can compare this estimate with the situation in 2005 when the global traffic was a bit over 2000 petabytes per month. The forecast predicts approximately eightfold increase in monthly traffic volume. Figure 3.2 compares monthly traffic estimates for a number of content providers. The growth of data-intensive services is evident in the amount of traffic transmitted per month. We observe that Google and YouTube have by far the greatest bandwidth requirements. The estimates for US traffic for these two services in mid-2007 far surpassed the US Internet backbone at year end in 1998; in fact the traffic was over seven times larger. This gives an idea of the radical growth of the Internet in the last 10 years. 3.1.1 Video Delivery Video delivery on the Internet is anticipated to see a huge increase, and the volume of video delivery is expected to be 700 times the capacity of the US Internet backbone in 2000. Cisco’s study anticipates that video traffic will account for 91% of all consumer traffic in 2013. 33 © 2010 Taylor and Francis Group, LLC
  • 52. 34 Overlay Networks: Toward Information Networking Cisco’s Global IP Traffic Forecast 2005–2011 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 Petabytes/Month 2005 2006 2007 2008 2009 2010 2011 FIGURE 3.1 Cisco’s global IP traffic forecast estimates 2005–2011. The increasing video-related traffic creates a number of challenges for network engi- neering. Video files are typically large, and with the advent of high-definition content they will be even larger. This means that even a small adoption of a video delivery technology can result in significant shifts in traffic patterns. This unpredictability of traffic patterns makes network provisioning more difficult and may result in decreased quality of service for customers [145]. Flash crowds contribute to the unpredictability of the network. A flash crowd happens when a certain video or Web site becomes, typically unexpectedly, massively popular [119]. Flash crowds can be alleviated by using content replication and caching schemes. Another frequently used technique by Web sites is to detect unexpected demand for content and simplify the Web content to make it smaller. Video delivery poses new challenges and opportunities for Internet service providers. Video consumes bandwidth, and with the emergence of flat rates consumers do not pay per megabyte. Moreover, the content may come from anywhere on the Internet, which may result in increased interdomain traffic charges for the ISP. This means that service revenue is no longer related to the connectivity revenue. 196 BBC (UK) April 2007 353 Yahoo (UK) April 2007 1129 Time Warner (US) May 2007 1854 ABC, NBC, ESPN, Disney (US) May 2007 2361 Yahoo (US) May 2007 3500 iTunes audio and video downloads (2006) 3709 Google (UK) April 2007 4148 MySpace (US) May 2007 6000 US Internet backbone at year end 1998 10956 Google and YouTube (US) May 2007 45750 Google and YouTube (worldwide mid-2007 Cisco estimate) Terabytes per month FIGURE 3.2 Monthly traffic estimates for content services. © 2010 Taylor and Francis Group, LLC
  • 53. Properties of Networks and Data 35 3.1.2 P2P Traffic According to the study, peer-to-peer traffic will continue to grow but become a smaller component of Internet traffic in terms of its current share. The current P2P systems in 2009 are transferring 3.3 exabytes of data per month. The recent study indicates that the P2P share of consumer Internet traffic will drop to 20% by 2013, down from the current 50% (at the end of 2008). Even though the P2P share may diminish, most video delivery solutions, accounting for much of the traffic increase, will utilize overlay technologies, which makes this area crucial for ensuring efficient and scalable services. 3.1.3 Trends in Networking Figure 3.3 presents a number of significant trends in IP networking and outlines their challenges and potential solutions. Current trends include P2P, Internet broadcast, both Internet and commercial video-on-demand (VoD), and high-definition content. P2P presents a number of challenges for IP networks because it increases traffic and utilizes upstream for data exchange. This changes the customary usage of the network in which downstream dominates the traffic model. Therefore, IP networks need to be provi- sioned in such a way that possible upstream bottlenecks are eliminated. Caching can be seen as a potential solution to P2P traffic. Indeed, many current P2P protocols are able to take network proximity into account so that data can be obtained from a nearby P2P node. Internet broadcast pertains to the dissemination of large media files or streams. Flash crowds are challenging because they make it difficult to provision the network in such a way that it can handle the expected demand for the content. This can be alleviated by using P2P content distribution technologies and multicast technologies. Since there is no global IP multicast support available, network layer multicast needs to be used in specific networks, such as metropolitan area networks or wireless access networks. Internet VoD is becoming increasingly popular, and thus the growth of the traffic is a challenge for the network. This mostly affects the metropolitan area networks and the core networks. The solutions include CDNs and increasing the network capacity. Data compression can also be used to reduce the size of the media files. VoD can be cached, which makes it easy to cache. Commercial VoD is typically delivered in the metropolitan P2P caching Growth in traffic, upstream bottlenecks P2P P2P content distribution, multicast technologies Flash crowds Internet Broadcast CDNs, increasing network capacity, compression Access network IPTV bottleneck, growth in VoD traffic volume in the metropolitan area network High-definition content CDNs, increasing network capacity, compression Growth in traffic in the metropolitan area network Commercial Video-on-Demand Content Delivery Networks (CDNs), increasing network capacity, compression Growth in traffic, especially metropolitan area and core Internet Video-on-Demand Solutions Challenges Trend FIGURE 3.3 Trends, challenges, and potential solutions for IP traffic. © 2010 Taylor and Francis Group, LLC
  • 54. 36 Overlay Networks: Toward Information Networking area network, which needs to be provisioned accordingly. The core network is not burdened much by commercial VoD, because the content can be replicated to relevant MAN networks. High-definition content also poses challenges, because due to higher quality the amount of data that needs to be transferred grows radically. Access networks are constrained by their IP television (IPTV) solution. CDNs and increasing the network capacity as well as compression are potential remedies. 3.2 Zipf’s Law A power-law implies that small occurrences are extremely common, whereas large instances are extremely rare. This regularity or law is also referred to as Zipf or Pareto. Zipf’s law is interesting for networked systems, because it has been shown that many different activities follow this law—for example, query distributions and Web site popularity. The linguist George Zipf first proposed the law in 1935 in the context of word frequencies in languages. For Web sites, the Zipf law means that large sites get disproportionately more traffic than smaller sites. In this section, we give an overview of the Zipf distribution and two related distributions, namely Pareto and power-law distributions. Then we briefly discuss the implications for the Internet and P2P. 3.2.1 Overview The Zipf distribution is concerned with the ranking of objects based on their popularity. The ranking is done by assigning the most popular object the rank of one, the second most popular object a rank of two, and so on. Zipf’s law states that if objects are ranked according to the frequency of occurrence, the frequency of occurrence F is related to the rank of the object R according to the relation F ∼ R−β , (3.1) where the constant is close to one. The simplest verification of the applicability of Zipf’s law is to plot the rank-ordered list of objects versus the frequency of the object on a log-log scale. On a log-log scale, the observance of a straight line is indicative of the applicability of Zipf’s law. The Zipf distribution and power-law distributions are directly related, and they are different ways of looking at the same phenomena [5]. Zipf is used to model the rank distributions and power-law for frequency distributions. The Zipf distribution is related to the Pareto distribution. Pareto was interested in the distribution of income, with the question of how many people have an income greater than x. Pareto’s law is defined in terms of the cumulative distribution function (CDF). The Pareto distribution gives the probability that a person’s income is greater than or equal to x: P[X x] ∼ x−k . (3.2) A power-law distribution in its typical usage tells the number of people whose income is exactly x rather than how many people had an income greater than x. It is the probability distribution function (PDF) associated with the CDF given by Pareto’s law P[X = x] ∼ x−(k+1) m (3.3) where k is the Pareto distribution shape parameter. © 2010 Taylor and Francis Group, LLC
  • 55. Other documents randomly have different content
  • 56. hereinafter mentioned, in so far as concerns them, on condition that the special stipulations contained in this article are fulfilled by Germany. Postal Conventions: Conventions and agreements of the Universal Postal Union concluded at Vienna, July 4, 1891. Conventions and agreements of the Postal Union signed at Washington, June 15, 1897. Conventions and agreements of the Postal Union signed at Rome May 26, 1906. Telegraphic Conventions: International Telegraphic Conventions signed at St. Petersburg July 10, (22,) 1875. Regulations and tariffs drawn up by the International Telegraphic Conference, Lisbon, June 11, 1908. Germany undertakes not to refuse her assent to the conclusion by the new States of the special arrangements referred to in the conventions and agreements relating to the Universal Postal Union and to the International Telegraphic Union, to which the said new States have adhered or may adhere. ARTICLE 284.—From the coming into force of the present treaty the high contracting parties shall apply, in so far as concerns them, the International Radio-Telegraphic Convention of July 5, 1912, on
  • 57. condition that Germany fulfills the provisional regulations which will be indicated to her by the Allied and Associated Powers. If within five years after the coming into force of the present treaty a new convention regulating international radio-telegraphic communications should have been concluded to take the place of the convention of July 5, 1912, this new convention shall bind Germany even if Germany should refuse either to take part in drawing up the convention or to subscribe thereto. This new convention will likewise replace the provisional regulations in force. ARTICLE 285.—From the coming into force of the present treaty the high contracting parties shall apply in so far as concerns them and under the conditions stipulated in Article 272 the conventions hereinafter mentioned: 1. The conventions of May 6, 1882, and Feb. 1, 1889, regulating the fisheries in the North Sea outside territorial waters. 2. The conventions and protocols of Nov. 16, 1887, Feb. 14, 1893, and April 11, 1894, regarding the North Sea liquor traffic. ARTICLE 286.—The International Convention of Paris of March 20, 1883, for the protection of industrial property, revised at Washington on June 2, 1911; the International Convention of Berne of Sept. 9, 1886, for the protection of literary and artistic works, revised at Berlin on Nov. 13, 1908, and completed by the additional protocol signed at Berne on March 20, 1914, will again come into effect as from the coming into force of the present treaty, in so far as they are not affected or modified by the exceptions and restrictions resulting therefrom.
  • 58. ARTICLE 287.—From the coming into force of the present treaty the high contracting parties shall apply, in so far as concerns them, the Convention of the Hague of July 17, 1905, relating to civil procedure. This renewal, however, will not apply to France, Portugal and Rumania. ARTICLE 288.—The special rights and privileges granted to Germany by Article 3 of the convention of Dec. 2, 1899, relating to Samoa shall be considered to have terminated on Aug. 4, 1914. ARTICLE 289.—Each of the Allied or Associated Powers, being guided by the general principles or special provisions of the present treaty, shall notify to Germany the bilateral treaties or conventions which such Allied or Associated Power wishes to revive with Germany. The notification referred to in the present article shall be made either directly or through the intermediary of another power. Receipt thereof shall be acknowledged in writing by Germany. The date of the revival shall be that of the notification. The Allied and Associated Powers undertake among themselves not to revive with Germany any conventions or treaties which are not in accordance with the terms of the present treaty. The notification shall mention any provisions of the said conventions and treaties which, not being in accordance with the terms of the present treaty, shall not be considered as revived. In case of any difference of opinion, the League of Nations will be called on to decide. A period of six months from the coming into force of the present treaty is allowed to the Allied and Associated Powers within which to make the notification.
  • 59. Only those bilateral treaties and conventions which have been the subject of such a notification shall be revived between the Allied and Associated Powers and Germany; all the others are and shall remain abrogated. The above regulations apply to all bilateral treaties or conventions existing between all the Allied and Associated Powers signatories to the present treaty and Germany, even if the said Allied and Associated Powers have not been in a state of war with Germany. ARTICLE 290.—Germany recognizes that all the treaties, conventions, or agreements which she has concluded with Austria, Hungary, Bulgaria, or Turkey since Aug. 1, 1914, until the coming into force of the present treaty are and remain abrogated by the present treaty. ARTICLE 291.—Germany undertakes to secure to the Allied and Associated Powers, and to the officials and nationals of the said powers, the enjoyment of all the rights and advantages of any kind which she may have granted to Austria, Hungary, Bulgaria, or Turkey, or to the officials and nationals of these States by treaties, conventions, or arrangements concluded before Aug. 1, 1914, so long as those treaties, conventions, or arrangements remain in force. The Allied and Associated Powers reserve the right to accept or not the enjoyment of these rights and advantages. ARTICLE 292.—Germany recognizes that all treaties, conventions, or arrangements which she concluded with Russia or with any State or Government of which the territory previously formed a part of Russia, or with Rumania before Aug. 1, 1914, or after that date until the coming into force of the present treaty, are and remain abrogated.
  • 60. ARTICLE 293.—Should an Allied or Associated Power, Russia, or a State or Government of which the territory formerly constituted a part of Russia have been forced since Aug. 1, 1914, by reason of military occupation or by any other means or for any other cause, to grant or to allow to be granted by the act of any public authority, concessions, privileges, and favors of any kind to Germany or to a German nation, such concessions, privileges, and favors are ipso facto annulled by the present treaty. No claims or indemnities which may result from this annulment shall be charged against the Allied or Associated Powers or the powers, States, Governments, or public authorities which are released from their engagements by the present article. ARTICLE 294.—From the coming into force of the present treaty Germany undertakes to give the Allied and Associated Powers and their nationals the benefit ipso facto of the rights and advantages of any kind which she has granted by treaties, conventions or arrangements to non-belligerent States or their nationals since Aug. 1, 1914, until the coming into force of the present treaty so long as those treaties, conventions, or arrangements remain in force. ARTICLE 295.—Those of the high contracting parties who have not yet signed, or who have signed but not yet ratified, the Opium Convention signed at The Hague on Jan. 23, 1912, agree to bring the said convention into force, and for this purpose to enact the necessary legislation without delay and in any case within a period of twelve months from the coming into force of the present treaty. Furthermore, they agree that ratification of the present treaty should in the case of powers which have not yet ratified the Opium Convention be deemed in all respects equivalent to the ratification of
  • 61. that convention and to the signature of the special protocol which was opened at The Hague in accordance with the resolutions adopted by the Third Opium Conference in 1914 for bringing the said convention into force. For this purpose the Government of the French Republic will communicate to the Government of the Netherlands a certified copy of the protocol of the deposit of ratifications of the present treaty, and will invite the Government of the Netherlands to accept and deposit the said certified copy as if it were a deposit of ratifications of the Opium Convention and a signature of the additional protocol of 1914. SECTION III.—Debts. ARTICLE 296.—There shall be settled through the intervention of clearing offices to be established by each of the high contracting parties within three months of the notification referred to in paragraph (e) hereafter the following classes of pecuniary obligations: 1. Debts payable before the war and due by a national of one of the contracting powers, residing within its territory, to a national of an opposing power, residing within its territory. 2. Debts which became payable during the war to nationals of one contracting power residing within its territory and arose out of transactions or contracts with the nationals of an opposing power, resident within its territory, of which the total or partial execution was suspended on account of the declaration of war. 3. Interest which has accrued due before and during the war to a national of one of the contracting powers in respect of securities issued by an opposing power, provided that the payment of
  • 62. interest on such securities to the nationals of that power or to neutrals has not been suspended during the war. 4. Capital sums which have become payable before and during the war to nationals of one of the contracting powers in respect of securities issued by one of the opposing powers, provided that the payment of such capital sums to nationals of that power or to neutrals has not been suspended during the war.
  • 63. Copyright Harris Ewing M. Stephen Pichon Chosen Chairman of the provisional organization of the League of Nations in recognition of his long leadership, not only in France but internationally, in the work of bringing about a world-wide organization to preserve peace.
  • 64. Click for a larger image. The proceeds of liquidation of enemy property, rights, and interests mentioned in Section IV. and in the annex thereto will be accounted for through the clearing offices, in the currency and at the rate of exchange hereinafter provided in Paragraph (d), and disposed of by them under the conditions provided by the said section and annex. The settlements provided for in this article shall be effected according to the following principles and in accordance with the annex to this section: a. Each of the high contracting parties shall prohibit, as from the coming into force of the present treaty, both the payment and the acceptance of payment of such debts, and also all communications between the interested parties with regard to the settlement of the said debts otherwise than through the clearing offices. b. Each of the high contracting parties shall be respectively responsible for the payment of such debts due by its nationals, except in the cases where before the war the debtor was in a state of bankruptcy or failure, or had given formal indication of insolvency or where the debt was due by a company whose business has been liquidated under emergency legislation during the war. Nevertheless, debts due by the inhabitants of territory invaded or occupied by the enemy before the armistice will not be guaranteed by the States of which those territories form part.
  • 65. c. The sums due to the nationals of one of the high contracting parties by the nationals of an opposing State will be debited to the clearing office of the country of the debtor, and paid to the creditor by the clearing office of the country of the creditor. d. Debts shall be paid or credited in the currency of such one of the Allied and Associated Powers, their colonies or protectorates, or the British Dominions or India, as may be concerned. If the debts are payable in some other currency they shall be paid or credited in the currency of the country concerned, whether an Allied or Associated Power, colony, protectorate, British Dominion, or India, at the pre-war rate of exchange. For the purpose of this provision the pre-war rate of exchange shall be defined as the average cable transfer rate prevailing in the Allied or Associated country concerned during the month immediately preceding the outbreak of war between the said country concerned and Germany. If a contract provides for a fixed rate of exchange governing the conversion of the currency in which the debt is stated into the currency of the Allied or Associated country concerned, then the above provisions concerning the rate of exchange shall not apply. In the case of new States the currency in which and the rate of exchange at which debts shall be paid or credited shall be determined by the Reparation Commission provided for in Part VIII. (Reparation.)
  • 66. e. The provisions of this article and of the annex thereto shall not apply as between Germany on the one hand and any one of the Allied and Associated Powers, their colonies or protectorates, or any one of the British Dominions or India on the other hand, unless within a period of one month from the deposit of the ratifications of the present treaty by the power in question, or of the ratification on behalf of such dominion or of India, notice to that effect is given to Germany by the Government of such Allied or Associated Power or of such Dominion or of India as the case may be. f. The Allied and Associated Powers who have adopted this article and the annex hereto may agree between themselves to apply them to their respective nationals established in their territory so far as regards matters between their nationals and German nationals. In this case the payments made by application of this provision will be subject to arrangements between the allied and associated clearing offices concerned. ANNEX 1. Each of the high contracting parties will, within three months from the notification provided for in Article 296, Paragraph (e), establish a clearing office for the collection and payment of enemy debts. Local clearing offices may be established for any particular portion of the territories of the high contracting parties. Such local clearing offices may perform all the functions of a central clearing office in their respective districts, except that all
  • 67. transactions with the clearing office in the opposing State must be effected through the central clearing office. 2. In this annex the pecuniary obligations referred to in the first paragraph of Article 296 are described as enemy debts, the persons from whom the same are due as enemy debtors, the persons to whom they are due as enemy creditors, the clearing office in the country of the creditor is called the Creditor Clearing Office, and the clearing office in the country of the debtor is called the Debtor Clearing Office. 3. The high contracting parties will subject contraventions of Paragraph (a) of Article 296 to the same penalties as are at present provided by their legislation for trading with the enemy. They will similarly prohibit within their territory all legal process relating to payment of enemy debts, except in accordance with the provisions of this annex. 4. The Government guarantee specified in Paragraph (b) of Article 296 shall take effect whenever, for any reason, a debt shall not be recoverable, except in a case where at the date of the outbreak of war the debt was barred by the laws of prescription in force in the country of the debtor, or where the debtor was at that time in a state of bankruptcy or failure or had given formal indication of insolvency, or where the debt was due by a company whose business has been liquidated under emergency legislation during the war. In such case the procedure specified by this annex shall apply to payment of the dividends.
  • 68. The terms bankruptcy and failure refer to the application of legislation providing for such juridical conditions. The expression formal indication of insolvency bears the same meaning as it has in English law. 5. Creditors shall give notice to the Creditor Clearing Office within six months of its establishment of debts due to them, and shall furnish the Clearing Office with any documents and information required of them. The high contracting parties will take all suitable measures to trace and punish collusion between enemy creditors and debtors. The clearing offices will communicate to one another any evidence and information which might help the discovery and punishment of such collusion. The high contracting parties will facilitate as much as possible postal and telegraphic communication at the expense of the parties concerned and through the intervention of the clearing offices between debtors and creditors desirous of coming to an agreement as to the amount of their debt. The Creditor Clearing Office will notify the Debtor Clearing Office of all debts declared to it. The Debtor Clearing Office will, in due course, inform the Creditor Clearing Office which debts are admitted and which debts are contested. In the latter case the Debtor Clearing Office will give the grounds for the non- admission of debt.
  • 69. 6. When a debt has been admitted, in whole or in part, the Debtor Clearing Office will at once credit the Creditor Clearing Office with the amount admitted, and at the same time notify it of such credit. 7. The debt shall be deemed to be admitted in full and shall be credited forthwith to the Creditor Clearing Office unless within three months from the receipt of the notification or such longer time as may be agreed to by the Creditor Clearing Office notice has been given by the Debtor Clearing Office that it is not admitted. 8. When the whole or part of a debt is not admitted the two clearing offices will examine into the matter jointly, and will endeavor to bring the parties to an agreement. 9. The Creditor Clearing Office will pay to the individual creditor the sums credited to it out of the funds placed at its disposal by the Government of its country and in accordance with the conditions fixed by the said Government, retaining any sums considered necessary to cover risks, expenses, or commissions. 10. Any person having claimed payment of an enemy debt which is not admitted in whole or in part shall pay to the clearing office by way of fine interest at 5 per cent. on the part not admitted. Any person having unduly refused to admit the whole or part of a debt claimed from him shall pay by way of fine interest at 5 per cent. on the amount with regard to which his refusal shall be disallowed.
  • 70. Such interest shall run from the date of expiration of the period provided for in Paragraph 7 until the date on which the claim shall have been disallowed or the debt paid. Each clearing office shall in so far as it is concerned take steps to collect the fines above provided for, and will be responsible if such fines cannot be collected. The fines will be credited to the other clearing office, which shall retain them as a contribution toward the cost of carrying out the present provisions. 11. The balance between the clearing offices shall be struck monthly, and the credit balance paid in cash by the debtor State within a week. Nevertheless, any credit balances which may be due by one or more of the Allied and Associated Powers shall be retained until complete payment shall have been effected of the sums due to the Allied or Associated Powers or their nationals on account of the war. 12. To facilitate discussion between the clearing offices each of them shall have a representative at the place where the other is established. 13. Except for special reasons all discussions in regard to claims will, so far as possible, take place at the Debtor Clearing Office.
  • 71. 14. In conformity with Article 296, Paragraph (b), the high contracting parties are responsible for the payment of the enemy debts owing by their nationals. The Debtor Clearing Office will therefore credit the Creditor Clearing Office with all debts admitted, even in case of inability to collect them from the individual debtor. The Governments concerned will, nevertheless, invest their respective clearing offices with all necessary powers for the recovery of debts which have been admitted. As an exception the admitted debts owing by persons having suffered injury from acts of war shall only be credited to the Creditor Clearing Office when the compensation due to the person concerned in respect of such injury shall have been paid. 15. Each Government will defray the expenses of the clearing office set up in its territory, including the salaries of the staff. 16. Where the two clearing offices are unable to agree whether a debt claimed is due, or in case of a difference between an enemy debtor and an enemy creditor, or between the clearing offices, the dispute shall either be referred to arbitration if the parties so agree under conditions fixed by agreement between them, or referred to the mixed arbitral tribunal provided for in Section VI. hereafter. At the request of the Creditor Clearing Office the dispute may, however, be submitted to the jurisdiction of the courts of the place of domicile of the debtor.
  • 72. 17. Recovery of sums found by the Mixed Arbitral Tribunal, the court, or the arbitration tribunal to be due shall be effected through the clearing offices as if these sums were debts admitted by the Debtor Clearing Office. 18. Each of the Governments concerned shall appoint an agent who will be responsible for the presentation to the mixed arbitral tribunal of the cases conducted on behalf of its clearing office. This agent will exercise a general control over the representatives or counsel employed by its nationals. Decisions will be arrived at on documentary evidence, but it will be open to the tribunal to hear the parties in person, or, according to their preference, by their representatives approved by the two Governments, or by the agent referred to above, who shall be competent to intervene along with the party or to reopen and maintain a claim abandoned by the same. 19. The clearing offices concerned will lay before the mixed arbitral tribunal all the information and documents in their possession, so as to enable the tribunal to decide rapidly on the cases which are brought before it. 20. Where one of the parties concerned appeals against the joint decision of the two clearing offices he shall make a deposit against the costs, which deposit shall only be refunded when the first judgment is modified in favor of the appellant and in proportion to the success he may attain, his opponent in case of such a refund being required to pay an equivalent proportion of
  • 73. the costs and expenses. Security accepted by the tribunal may be substituted for a deposit. A fee of 5 per cent. of the amount in dispute shall be charged in respect of all cases brought before the tribunal. This fee shall, unless the tribunal directs otherwise, be borne by the unsuccessful party. Such fee shall be added to the deposit referred to. It is also independent of the security. The tribunal may award to one of the parties a sum in respect of the expenses of the proceedings. Any sum payable under this paragraph shall be credited to the clearing office of the successful party as a separate item. 21. With a view to the rapid settlement of claims, due regard shall be paid in the appointment of all persons connected with the clearing offices or with the Mixed Arbitral Tribunal to their knowledge of the language of the other country concerned. Each of the clearing offices will be at liberty to correspond with the other, and to forward documents in its own language. 22. Subject to any special agreement to the contrary between the Governments concerned, debts shall carry interest in accordance with the following provisions: Interest shall not be payable on sums of money due by way of dividend, interest, or other periodical payments which themselves represent interest on capital.
  • 74. The rate of interest shall be 5 per cent. per annum except in cases where, by contract, law, or custom, the creditor is entitled to payment of interest at a different rate. In such cases the rate to which he is entitled shall prevail. Interest shall run from the date of commencement of hostilities (or, if the sum of money to be recovered fell due during the war, from the date at which it fell due) until the sum is credited to the clearing office of the creditor. Sums due by way of interest shall be treated as debts admitted by the clearing offices and shall be credited to the Creditor Clearing Office in the same way as such debts. 23. Where by decision of the clearing offices or the Mixed Arbitral Tribunal a claim is held not to fall within Article 296, the creditor shall be at liberty to prosecute the claim before the courts or to take such other proceedings as may be open to him. The presentation of a claim to the clearing office suspends the operation of any period of prescription. 24. The high contracting parties agree to regard the decisions of the Mixed Arbitral Tribunal as final and conclusive, and to render them binding upon their nationals. 25. In any case where a Creditor Clearing Office declines to notify a claim to the Debtor Clearing Office, or to take any step provided for in this annex, intended to make effective in whole or in part a request of which it has received due notice, the enemy creditor
  • 75. shall be entitled to receive from the clearing office a certificate setting out the amount of the claim, and shall then be entitled to prosecute the claim before the courts or to take such other proceedings as may be open to him. SECTION IV.—Property, Rights, and Interests ARTICLE 297.—The question of private property, rights, and interests in an enemy country shall be settled according to the principles laid down in this section and to the provisions of the annex hereto: a. The exceptional war measures and measures of transfer (defined in Paragraph 3 of the annex hereto) taken by Germany with respect to the property, rights, and interests of nationals of Allied or Associated Powers, including companies and associations in which they are interested, when liquidation has not been completed, shall be immediately discontinued or stayed and the property, rights, and interests concerned restored to their owners, who shall enjoy full rights therein in accordance with the provisions of Article 298. b. Subject to any contrary stipulations which may be provided for in the present treaty, the Allied and Associated Powers reserve the right to retain and liquidate all property, rights, and interests belonging at the date of the coming into force of the present treaty to German nationals, or companies controlled by them, within their territories, colonies, possessions, and protectorates including territories ceded to them by the present treaty.
  • 76. The liquidation shall be carried out in accordance with the laws of the Allied or Associated State concerned, and the German owner shall not be able to dispose of such property, rights, or interests nor to subject them to any charge without the consent of that State. German nationals who acquire ipso facto the nationality of an Allied or Associated Power in accordance with the provisions of the present treaty will not be considered as German nationals within the meaning of this paragraph. c. The price of the amount of compensation in respect of the exercise of the right referred to in the preceding Paragraph (b) will be fixed in accordance with the methods of sale or valuation adopted by the laws of the country in which the property has been retained or liquidated. d. As between the Allied and Associated Powers or their nationals on the one hand and Germany or her nationals on the other hand, all the exceptional war measures, or measures of transfer, or acts done or to be done in execution of such measures as defined in Paragraphs 1 and 3 of the annex hereto shall be considered as final and binding upon all persons except as regards the reservations laid down in the present treaty. e. The nationals of Allied and Associated Powers shall be entitled to compensation in respect of damage or injury inflicted upon their property, rights, or interests including any company or association in which they are interested, in German territory as it existed on Aug. 1, 1914, by the application either of the
  • 77. exceptional war measures or measures of transfer mentioned in Paragraphs 1 and 3 of the annex hereto. The claims made in this respect by such nationals shall be investigated, and the total of the compensation shall be determined by the Mixed Arbitral Tribunal provided for in Section VI, or by an arbitrator appointed by that tribunal. This compensation shall be borne by Germany, and may be charged upon the property of German nationals, within the territory or under the control of the claimant's State. This property may be constituted as a pledge for enemy liabilities under the conditions fixed by Paragraph 4 of the annex hereto. The payment of this compensation may be made by the Allied or Associated State, and the amount will be debited to Germany. f. Whenever a national of an Allied or Associated Power is entitled to property which has been subjected to a measure of transfer in German territory and expresses a desire for its restitution, his claim for compensation in accordance with Paragraph (e) shall be satisfied by the restitution of the said property if it still exists in specie. In such case Germany shall take all necessary steps to restore the evicted owner to the possession of his property, free from all incumbrances or burdens with which it may have been charged after the liquidation, and to indemnify all third parties injured by the restitution. If the restitution provided for in this paragraph cannot be effected, private agreements arranged by the intermediation of the powers concerned or the clearing offices provided for in the
  • 78. Annex to Section III. may be made, in order to secure that the national of the Allied or Associated Power may secure compensation for the injury referred to in Paragraph (e) by the grant of advantages or equivalents which he agrees to accept in place of the property, rights or interests of which he was deprived. Through restitution in accordance with this article the price or the amount of compensation fixed by the application of Paragraph (e) will be reduced by the actual value of the property restored, account being taken of compensation in respect of loss of use or deterioration. g. The rights conferred by Paragraph (f) are reserved to owners who are nationals of Allied or Associated Powers within whose territory legislative measures prescribing the general liquidation of enemy property, rights or interests were not applied before the signature of the armistice. h. Except in cases where, by application of Paragraph (f), restitutions in specie have been made, the net proceeds of sales of enemy property, rights or interests wherever situated carried out either by virtue of war legislation, or by application of this article, and in general all cash assets of enemies, shall be dealt with as follows: 1. As regards powers adopting Section III. and the annex thereto, the said proceeds and cash assets shall be credited to the power of which the owner is a national, through the clearing office established thereunder; any credit balance in
  • 79. favor of Germany resulting therefrom shall be dealt with as provided in Article 243. 2. As regards powers not adopting Section III. and the annex thereto, the proceeds of the property, rights and interests, and the cash assets, of the nationals or Allied or Associated Powers held by Germany shall be paid immediately to the person entitled thereto or to his Government; the proceeds of the property, rights and interests, and the cash assets, of German nationals received by an Allied or Associated Power shall be subject to disposal by such power in accordance with its laws and regulations and may be applied in payment of the claims and debts defined by this article or Paragraph 4 of the annex hereto. Any property, rights and interests or proceeds thereof or cash assets not used as above provided may be retained by the said Allied or Associated Power and if retained the cash value thereof shall be dealt with as provided in Article 243. In the case of liquidations effected in new States, which are signatories of the present treaty as Allied and Associated Powers, or in States which are not entitled to share in the reparation payments to be made by Germany, the proceeds of liquidations effected by such States shall, subject to the rights of the Reparation Commission under the present treaty, particularly under Articles 235 and 260, be paid direct to the owner. If on the application of that owner the Mixed Arbitral Tribunal, provided for by Section VI. of this part or an arbitrator appointed by that tribunal, is satisfied that the conditions of the sale or measures taken by the
  • 80. Government of the State in question outside its general legislation were unfairly prejudicial to the price obtained, they shall have discretion to award to the owner equitable compensation to be paid by that State. i. Germany undertakes to compensate its nationals in respect of the sale or retention of their property, rights or interests in Allied or Associated States. j. The amount of all taxes and imposts upon capital levied or to be levied by Germany on the property, rights, and interests of the nationals of the Allied or Associated Powers from the 11th of November, 1918, until three months from the coming into force of the present treaty, or, in the case of property, rights or interests which have been subjected to exceptional measures of war, until restitution in accordance with the present treaty, shall be restored to the owners. ARTICLE 298.—Germany undertakes, with regard to the property, rights and interests, including companies and associations in which they were interested, restored to nationals of Allied and Associated Powers in accordance with the provisions of Article 297, Paragraph (a) or (f): a. to restore and maintain, except as expressly provided in the present treaty, the property, rights, and interests of the nationals of Allied or Associated Powers in the legal position obtaining in respect of the property, rights, and interests of German nationals under the laws in force before the war.
  • 81. b. not to subject the property, rights, or interests of the nationals of the Allied or Associated Powers to any measures in derogation of property rights which are not applied equally to the property, rights, and interests of German nationals, and to pay adequate compensation in the event of the application of these measures. ANNEX 1. In accordance with the provisions of Article 297, Paragraph (d), the validity of vesting orders and of orders for the winding up of businesses or companies, and of any other orders, directions, decisions, or instructions of any court or any department of the Government of any of the high contracting parties made or given, or purporting to be made or given, in pursuance of war legislation with regard to enemy property, rights, and interests is confirmed. The interests of all persons shall be regarded as having been effectively dealt with by any order, direction, decision, or instruction dealing with property in which they may be interested, whether or not such interests are specifically mentioned in the order, direction, decision, or instruction. No question shall be raised as to the regularity of a transfer of any property, rights, or interests dealt with in pursuance of any such order, direction, decision, or instruction. Every action taken with regard to any property, business, or company, whether as regards its investigation, sequestration, compulsory administration, use, requisition, supervision, or winding up, the sale or management of property, rights, or interests, the collection or discharge of debts, the payment of costs, charges or expenses, or any other matter whatsoever, in pursuance of orders, directions, decisions, or instructions of any court or of
  • 82. any department of the Government of any of the high contracting parties, made or given, or purporting to be made or given in pursuance of war legislation with regard to enemy property, rights or interests, is confirmed. Provided that the provisions of this paragraph shall not be held to prejudice the titles to property heretofore acquired in good faith and for value and in accordance with the laws of the country in which the property is situated by nationals of the Allied and Associated Powers. The provisions of this paragraph do not apply to such of the above-mentioned measures as have been taken by the German authorities in invaded or occupied territory, nor to such of the above mentioned measures as have been taken by Germany or the German authorities since Nov. 11, 1918, all of which shall be void. 2. No claim or action shall be made or brought against any Allied or Associated Power or against any person acting on behalf of or under the direction of any legal authority or department of the Government of such a power by Germany or by any German national wherever resident in respect of any act or omission with regard to his property, rights, or interests during the war or in preparation for the war. Similarly no claim or action shall be made or brought against any person in respect of any act or omission under or in accordance with the exceptional war measures, laws, or regulations of any Allied or Associated Power.
  • 83. 3. In Article 297 and this Annex the expression exceptional war measures includes measures of all kinds, legislative, administrative, judicial, or others, that have been taken or will be taken hereafter with regard to enemy property, and which have had or will have the effect of removing from the proprietors the power of disposition over their property, though without affecting the ownership, such as measures of supervision, of compulsory administration, and of sequestration; or measures which have had or will have as an object the seizure of, the use of, or the interference with enemy assets, for whatsoever motive, under whatsoever form or in whatsoever place. Acts in the execution of these measures include all detentions, instructions, orders or decrees of Government departments or courts applying these measures to enemy property, as well as acts performed by any person connected with the administration or the supervision of enemy property, such as the payment of debts, the collecting of credits, the payment of any costs, charges, or expenses, or the collecting of fees. Measures of transfer are those which have affected or will affect the ownership of enemy property by transferring it in whole or in part to a person other than the enemy owner, and without his consent, such as measures directing the sale, liquidation, or devolution of ownership in enemy property, or the canceling of titles or securities. 4. All property, rights, and interests of German nationals within the territory of any Allied or Associated Power and the net proceeds of their sale, liquidation or other dealing therewith
  • 84. Welcome to our website – the perfect destination for book lovers and knowledge seekers. We believe that every book holds a new world, offering opportunities for learning, discovery, and personal growth. That’s why we are dedicated to bringing you a diverse collection of books, ranging from classic literature and specialized publications to self-development guides and children's books. More than just a book-buying platform, we strive to be a bridge connecting you with timeless cultural and intellectual values. With an elegant, user-friendly interface and a smart search system, you can quickly find the books that best suit your interests. Additionally, our special promotions and home delivery services help you save time and fully enjoy the joy of reading. Join us on a journey of knowledge exploration, passion nurturing, and personal growth every day! ebookbell.com