QuaP2P P2P Tutorial 2006

Overview on P2P Principles Kalman Graffi DFG Research Group QuaP2P Technische Universität Darmstadt

Overview Motivation Peer-to-Peer principle Overlay networks Unstructured P2P networks Centralized P2P network Distributed P2P network Structured DHT-based P2P networks Chord Content Addressable Network

Overview Motivation Peer-to-Peer principle Overlay networks Unstructured P2P networks Centralized P2P network Distributed P2P network Structured DHT-based P2P networks Chord

Motivation A huge number of nodes participating in the network Having resources to share With demands for resources they do not have Main question Which of the nodes keep resources I want? Pe er-to-Peer (P2P) as solution P2P offers mechanisms to find / look up what I want Therefore: Build additional overlay network After finding the node providing the desired service: Communicate directly from peer to peer

Peer-to-Peer Properties Heterogeneous peers Reliable? Permanent? Connectivity of individual peers cannot be assumed Peers offer and consume services and resources Services are exchanged between any participating peers Peers (=end-systems) form an overlay network Peers have significant autonomy Self-organizing system Overlay Connection Service Delivery

Overlay Networks Picture adapted from Traversat, et.al Project JXTA virtual network Overlay network network built ON TOP of one or more existing networks (e.g.IP network) adds an additional layer of abstraction indirection/virtualization Provide sophisticated services (search, look-up) Both P2P overlay and IP network have their own addressing scheme, provide routing functionality are based on the end-to-end principle TCP/IP TCP/IP TCP/IP Peers Overlay Network Underlay Networks

Overlay Networks: Properties Advantages: New layer fastens search/lookup of requested information Allow for bootstrapping Make use of existing environment adding new layer Not all nodes have to participate in maintaining But free riding is still a problem Disadvantages: Overhead Additional layer in networking stack, Complexity Layering does not eliminate complexity, it only manages it Misleading behavior, unintended interaction between layers Redundancy / Features may be available at various layer Two types of P2P overlay networks: Unstructured and Structured

Structured and Unstructured P2P Systems Unstructured P2P Networks objects have no special identifier location of desired object a priori not known each peer is only responsible for its own objects Search: Find all (or some) objects in the P2P network which fit to given criteria. Structured P2P Networks peers and objects have identifiers, strict topology objects are stored on peers according to their ID: responsibleFor(ObjID) = PeerID distributed indexing points to object location Lookup / Addressing: Retrieve the object which is identified with a given identifier .

1st Generation 2nd Generation 3rd Generation Unstructured P2P : Centralized from R.Schollmeier and J.Eberspächer, TU München DHT-Based Pure P2P Hybrid P2P Centralized P2P 1. Any terminal entity can be removed without loss of functionality 2. No central entities, fully distributed 3. “Fixed” connections in the overlay network 4.Search costs: O(log n) 5.Costs for state: O(log n) 6.For: Lookup Any terminal entity can be removed without loss of functionality Dynamic central entities for faster search 3.Search costs: variable 4.Costs for state: variable 5.For: Searches 1.Any terminal entity can be removed without loss of functionality 2. No central entities, fully distributed 3.Search costs: O(n) 4.Costs for state: O(1) 5.For: Searches 1.Central entity is necessary to provide the service 2.Central entity is some kind of index database 3.Search costs: O(1) 4.Costs for state: O(n) 5.For: Searches Structured P2P Unstructured P2P

Principle Central server stores information about locations Unstructured Centralized P2P Systems  Node B requests item D from node A  Requester (B) asks server S for the location of D  Server S tells B that node A stores item D  Node A (provider) tells server that it stores item D 2. Send Query for desired object 3. P2P com-munication. Get Contents 1. Publish contents at own peer, tell server ?

Unstructured Centralized P2P Systems Advantages Search complexity of O(1) – “just ask the server” Complex and fuzzy queries are possible Simple, fast and finding all objects Problems No robustness: server is single point of failure (SPOF) No self organization No intrinsic scalability: O(N) network and system load Non-linear increasing maintenance cost in particular for achieving high availability and scalability But overall, … Best principle for small and simple applications

1st Generation 2nd Generation 3rd Generation Unstructured P2P : Pure / Distributed from R.Schollmeier and J.Eberspächer, TU München DHT-Based Pure P2P Hybrid P2P Centralized P2P 1. Any terminal entity can be removed without loss of functionality 2. No central entities, fully distributed 3. “Fixed” connections in the overlay network 4.Search costs: O(log n) 5.Costs for state: O(log n) 6.For: Lookup Any terminal entity can be removed without loss of functionality Dynamic central entities for faster search 3.Search costs: variable 4.Costs for state: variable 5.For: Searches 1.Any terminal entity can be removed without loss of functionality 2. No central entities, fully distributed 3.Search costs: O(n) 4.Costs for state: O(1) 5.For: Searches 1.Central entity is necessary to provide the service 2.Central entity is some kind of index database 3.Search costs: O(1) 4.Costs for state: O(n) 5.For: Searches Structured P2P Unstructured P2P

Unstructured Distributed P2P Systems Fully Distributed Approach Central systems vulnerable, do not scale, unbalanced costs Unstructured P2P systems follow opposite approach No global information available about location of item Information only stored at respective node providing it Retrieval of data No routing information for content Necessity to ask as many systems as possible Approaches: Flooding: high traffic load on network, does not scale Highest effort for searching quick search through large areas many messages needed for unique identification

Unstructured Distributed P2P Systems Characteristics All peers are equal (in their roles) Search mechanism is provided by cooperation of all peers Local view on the network Overlay Network Service delivery Tasks to solve: Connecting to the network No central index server. Joining strategies needed To join: have to know at least 1 peer in network Local view on network => advertisements needed Search different search strategies available providing different benefits & drawbacks Service delivery Establish connection to other node Peer to peer communication

Principle No information about the objects is spread Unstructured Distributed P2P Example 2. Search desired object 3. P2P com-munication. Get Contents  Node C searching object floods the network Node A and B send a reply  Node A and B (provider) store object, tell no one.  Node C requests the object from subset of the repliers 1. Publish contents at own peer ?

Properties of Distributed P2P Systems Benefits: Robustness: Every peer is dispensable Switch off peer => no effect for network Balanced costs: each peer contributes the same Self organization Drawbacks: Slow and expensive search (flood the network) Finding all objects fitting to search criteria is not guaranteed Object out of reach for search query

1st Generation 2nd Generation 3rd Generation Unstructured Hybrid P2P Systems from R.Schollmeier and J.Eberspächer, TU München DHT-Based Pure P2P Hybrid P2P Centralized P2P 1. Any terminal entity can be removed without loss of functionality 2. No central entities, fully distributed 3. “Fixed” connections in the overlay network 4.Search costs: O(log n) 5.Costs for state: O(log n) 6.For: Lookup Any terminal entity can be removed without loss of functionality Dynamic central entities for faster search 3.Search costs: variable 4.Costs for state: variable 5.For: Searches 1.Any terminal entity can be removed without loss of functionality 2. No central entities, fully distributed 3.Search costs: O(n) 4.Costs for state: O(1) 5.For: Searches 1.Central entity is necessary to provide the service 2.Central entity is some kind of index database 3.Search costs: O(1) 4.Costs for state: O(n) 5.For: Searches Structured P2P Unstructured P2P

Unstructured Hybrid P2P Systems Combine best of both worlds: Robustness by distributed indexing Fast searches by server queries How it works: Supernodes are mini servers / super peers Normal peers: have only overlay connections to supernodes Use supernodes as servers for queries Supernodes: queries are flooded in supernodes subnetwork Advantages: More robust than centralized solutions Faster searches than in pure P2P systems Disadvantages: Need algorithms to choose reliable supernodes

Unstructured Hybrid P2P Systems Example: Gnutella 0.6 from R.Schollmeier and J.Eberspächer, TU München

1st Generation 2nd Generation 3rd Generation Structured DHT-based P2P Systems from R.Schollmeier and J.Eberspächer, TU München DHT-Based Pure P2P Hybrid P2P Centralized P2P 1. Any terminal entity can be removed without loss of functionality 2. No central entities, fully distributed 3. “Fixed” connections in the overlay network 4.Search costs: O(log n) 5.Costs for state: O(log n) 6.For: Lookup Any terminal entity can be removed without loss of functionality Dynamic central entities for faster search 3.Search costs: variable 4.Costs for state: variable 5.For: Searches 1.Any terminal entity can be removed without loss of functionality 2. No central entities, fully distributed 3.Search costs: O(n) 4.Costs for state: O(1) 5.For: Searches 1.Central entity is necessary to provide the service 2.Central entity is some kind of index database 3.Search costs: O(1) 4.Costs for state: O(n) 5.For: Searches Structured P2P Unstructured P2P

Distributed Hash Table: Steps of Operation Beginning: Mapping nodes and data onto same address space Peers and objects are addressed using flat IDs. Nodes are responsible for data in certain parts of the address space: responsibleFor(ObjectID) = PeerID Association of data to nodes may change (churn) Later: Storing / Looking up data in the DHT Retrieving data = routing to the responsible node Responsible node not necessarily known in advance Deterministic statement about availability of data All nodes maintain routing information to other nodes Limitations Maintenance of routing information required Load balancing problematic No fuzzy search supported

Principle: Location of the objects is found via routing  Node A (provider) advertises object at responsible peer B. Structured Overlay Networks: Example 3. P2P com-munication. Get link to object. 2. “Routing” to / Lookup of desired Object Advertisement is routed to B.  Node C looking for object sends query to the network. Query is routed to responsible node.  Node B replies to C by sending contact information of A 1. Publish link at responsible Peer ?

Chord: Network Topology Uses SHA-1 to map IP address/object name to 160 Bit ID Basic ring topology mod 2 n Successor/ Predecessor Circular Key Space Link to ring successor 2207 2012-2207 2906 2683-2906 3485 2907-3485 2011 1623-2011 1622 1009-1622 1008 710-1008 709 660-709 659 612-659 2682 2208-2682 611 3486-4095 0-611 Enhanced topology k th finger of Peer n is shortcut pointing to peers being responsible for Object ID (n + 2 k ) O(log(N)) fingers lead to lookup operation of O(log(N)) Fingers poin to peers with ObjectIDs increasing ex-ponentially. Here: 709 + 2 k = …, 965, 1221, 1733, 2757 2207 2012-2207 2906 2683-2906 3485 2907-3485 2011 1623-2011 1622 1009-1622 1008 710-1008 709 660-709 659 612-659 2682 2208-2682 611 3486-4095 0-611

Chord: Addressing Content Query Contains the hash value of the queried content On each step the distance from the destination is halved Node 1008 queries item 3000 Use Fingers to locate the destination faster Without fingers: no shortcuts, walk the circle Responsible peer found 2207 2012-2207 2906 2683-2906 3485 2907-3485 2011 1623-2011 1622 1009-1622 1008 710-1008 709 660-709 659 612-659 2682 2208-2682 611 3486-… 0-611 2 Responsible for 1008 + 1024 3 1 Responsible for 2207 + 512 Responsible for 3000

Chord: Join Procedure Request to join the Chord ring New node (e.g. 1289) contacts a member of the ring (e.g. 2906) Contacted node routes the query to the responsible node (1622) Responsible node (1622) contacts new node Then: 2207 2012-2207 2906 2683-2906 3485 2907-3485 2011 1623-2011 1622 1290-1622 1008 710-1008 709 660-709 659 612-659 2682 2208-2682 611 3486-… 0-611 2a. Set new predecessor 2b. Redistribute indexing information (e.g. 1009-1289) 3. Update successor of predecessor 4. Build fingers Fingers of peer n pointing to peers responsible for ObjectID n + 2 k thus, log(N) fingers are built. 1289 1009-1289 1. Set/contact successor

Summary Motivation Peer-to-Peer principle Overlay networks Unstructured P2P networks Centralized P2P network Distributed P2P network Structured DHT-based P2P networks Chord

Questions? Thank you for your attention. Any questions? ?

Content Addressable Network (CAN) A d-dimensional hash-table in cyclic coordinate space. d hash-functions, 1 per coordinate PeerID(p) =(h1(p),h2(p),... hd(p)) ObjID(obj) =(h1(obj),…,hd(obj)) CAN nodes Each is responsible for a distinct rectangular zone of the space Store data that hash into its zone The peers cover together the entire space 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 n2 n3 n4 n5 f1 f2 f3 f4 2-dimensional CAN Routing: A peer knows the IP addresses and zone ranges of its neighbors Peers can communicate only with their neighbors Properties Routing table size O(d) Guarantees that a file is found in at most d*n 1/d steps, where n is the total number of peers n1

CAN: New Peer Join New node has to acquire a zone to be responsible for Steps: The node chooses randomly a point P in the space The zone which includes P will be split in 2 halfs New node n6 requests to join Contacts a node (e.g. n5) Selects point P n5 routes the join query to n1 n1 splits its zone n6 is responsible for the new zone (at point P) 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 n1 n2 n3 n4 n5 f1 f2 f3 f4 2-dimensional CAN Figure modified from another presentation n6 P n6

Summary Motivation Peer-to-Peer principle Overlay networks Unstructured P2P networks Centralized P2P network Distributed P2P network Structured DHT-based P2P networks Chord Content Addressable Network

After this slide: slides in reserve

Requirements for Overlay Networks Fault-tolerance resilience of the connectivity when failures are encountered by arbitrary leave of peers Heterogeneity considering variations in physical capabilities and peer behavior (e.g. file fishing) Fairness evenly distributing workload across nodes Security ability of a system to manage, protect and distribute sensitive information Privacy degree to which a system or component allows for (or supports) anonymous transactions

Requirements for Overlay Networks: Trade-offs Time – Space e.g. local information vs. complete replication of indices Security – Privacy e.g. fully logged operations vs. totally untraceable Efficiency – Completeness e.g. exact key-based matching vs. partial matching (use of wildcards) Scope – Network load with TTL (time to live) e.g. TTL based requests vs. exhaustive search Efficiency – Autonomy e.g. hierarchical vs. pure P2P overlays Reliability – Low maintenance overhead e.g. deterministic vs. probabilistic operations

Centralized P2P Networks Central index server, maintaining index: What: object name, file name, criteria (ID3) … Where (IP address, Port) Search engine, combining both information Global view on the network Normal peer, maintaining the objects: Each peer maintains only its own objects decentralized storage (content at the edges) file transfer between clients (decentralized) Issues: Unbalanced costs: central server is bottleneck Security: server is single point of failure Central Server Overlay Network

Search in Centralized P2P Networks Search Peers contact central server asking for objects which fulfill some criteria Central server answers with list of addresses of peers that contain objects with these criteria Peer contacts peers containing desired objects Transferring object / providing service P2P Central Server 3,4: Service Delivery P2P 1: Query server 2: Server answers

Step 1: Addressing in Distributed Hash Tables Mapping of content/nodes into linear space Usually: 0, …, 2 m -1 >> number of objects to be stored Mapping of data and nodes into an address space (e.g.0 to 2 m -1) (with hash function) E.g., Hash( String ) mod 2 m : H(„ my data “)  2313 Association of parts of address space to DHT nodes The content of this slide has been adapted from “Peer-to-Peer Systems and Applications”, ed. By Steinmetz, Wehrle H(Node Y)=3485 3485 - 610 1622 - 2010 611 - 709 2011 - 2206 2207- 2905 (3485 - 610) 2906 - 3484 1008 - 1621 Y X 2 m -1 0 Often, the address space is viewed as a circle. Data item “D”: H(“D”)=3107 H(Node X)=2906

Step 2: Association of Address Space with Nodes Arrangement of the range of values Each node is responsible for part of the value range Often with redundancy (overlapping of parts) Continuous adaptation Real (underlay) and logical (overlay) topology are (mostly) uncorrelated The content of this slide has been adapted from “Peer-to-Peer Systems and Applications”, ed. By Steinmetz, Wehrle Node 3485 is responsible for data items in range 2907 to 3485 (in case of a Chord-DHT) Logical view of the Distributed Hash Table Mapping on the real topology 2207 2906 3485 2011 1622 1008 709 611

Step 3: Locating a Data Item Locating the data content-based routing Goal: Small and scalable effort O(1) with centralized hash table But: Management of a centralized hash table too costly (server) Minimum overhead with distributed hash tables O(log N): DHT hops to locate object O(log N): number of keys and routing information per node (N = # nodes) The content of this slide has been adapted from “Peer-to-Peer Systems and Applications”, ed. By Steinmetz, Wehrle

Step 4: Routing to a Data Item Routing to a key/value-pair Start lookup at arbitrary node of DHT Routing to requested data item (key) ( 3107, (ip, port) ) Value = pointer to location of data Key = H(“ my data ”) Node 3485 manages keys 2907-3485, Initial node (arbitrary) H(„ my data “) = 3107 2207 2906 3485 2011 1622 1008 709 611 ? The content of this slide has been adapted from “Peer-to-Peer Systems and Applications”, ed. By Steinmetz, Wehrle

Step 5: Data Retrieval – Usage of located Resource Accessing the content Key/value-pair is delivered to requester Requester analyzes key/Value-tuple (and downloads data from actual location – in case of indirect storage) H(„ my data “) = 3107 2207 2906 3485 2011 1622 1008 709 611 ? Get_Data(ip, port) Node 3485 sends (3107, (ip/port)) to requester In case of indirect storage: After knowing the actual Location, data is requested The content of this slide has been adapted from “Peer-to-Peer Systems and Applications”, ed. By Steinmetz, Wehrle

(Step 6) Where is the Data located? Association of Data with IDs Direct Storage Indirect Storage D D 134.2.11.68 2207 2906 3485 2011 1622 1008 709 611 H SHA-1 („D“)=3107 D 2207 2906 3485 2011 1622 1008 709 611 H SHA-1 („D“)=3107 Item D: 134.2.11.68 D 134.2.11.68

Distributed Hash Table: Insert and Delete a Node Join of a new node Calculation of node ID New node contacts DHT via arbitrary node Assignment of a particular hash range Copying of key/value-pairs of hash rang (usually with redundancy) Binding into routing environment 2207 2906 3485 2011 1622 1008 709 611 ID: 3485 134.2.11.68    The content of this slide has been adapted from “Peer-to-Peer Systems and Applications”, edt. By Steinmetz, Wehrle

Node Failure and Node Departure Failure of a node Use of redundant key/value pairs (if a node fails) Use of redundant / alternative routing paths Key-value usually still retrievable if at least one copy remains Departure of a node Partitioning of hash range to neighbor nodes Copying of key/value pairs to corresponding nodes Unbinding from routing environment The content of this slide has been adapted from “Peer-to-Peer Systems and Applications”, edt. By Steinmetz, Wehrle

Summary of DHTs: Properties Hash buckets distributed over nodes Nodes form an overlay network Route messages in overlay to find responsible node Routing scheme in the overlay network is the difference between different DHTs DHT behavior and usage: Node knows “object” name and wants to find it Unique and known object names assumed Node routes a message in overlay to the responsible node Responsible node replies with “object” Semantics of “object” are application defined 3.6

Chord: Join Procedure (1) Request to join the Chord ring New Peer 1289 1. Contact a member of the ring 2. Route the query in the ring 3. Provide new peer’s successor 2207 2012-2207 2906 2683-2906 3485 2907-3485 2011 1623-2011 1622 1009-1622 1008 710-1008 709 660-709 659 612-659 2682 2208-2682 611 3486-… 0-611

CAN: New Peer Join New node has to acquire a zone to be responsible for Steps: The node chooses randomly a point P in the space The zone which includes P will be split in 2 halfs Example: Node n6 requests to Join Contacts n4 Selects point P n4 routes the Join query to n1 n1 splits its zone n6 is responsible for the new zone (at point P) P n6 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5 f1 f2 f3 f4

CAN: Routing 2 CAN nodes are neighbors if their zones overlap along d-1 dimensions and abut along one dimension I.e. A node knows the IP addresses of its neighbors A node knows the coordinates of neighboring zones Nodes can communicate only with their neighbors Properties Routing table size O(d) Guarantees that a file is found in at most d*n 1/d steps, where n is the total number of nodes 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5 f2 f4 f3 f6 Lookup example: Node n5 is looking for file f3 Abut = direkt angrenzen f5 f1

Kademlia Widely deployed DHT (used in e.g. „eMule“) Features DHT-based overlay network using the XOR metric E.g. D( 1 10 1 01, 0 10 0 01) = 4 + 32 = 36 Simple operation Symmetrical (A->B == B->A) Use lookup messages to maintain the overlay network (Cannot be applied to asymmetrical Chord) Multiple routing possibilities Parallel asynchronous queries Overcome faulty nodes Properties Scalable Logarithmic complexity in node degree (O(log(N))) and lookup steps (O(log(N))) Efficient Low maintenance cost Robust 3.9

Kademlia: System Description Nodes are leaves in a binary tree Position is determined by shortest unique prefix of node ID For every node (e.g. node „001“) Divide the binary tree into a series of successively lower sub-trees that do not contain that node At least one contact node is required in each sub-tree Large sub-trees provide many neighbor alternatives

Kademlia: Lookup Operation Every hop forwards the queries to a smaller sub-tree around the target Example: Node „001“ routes to “101” Node „001“ needs a contact in 1-(*) sub-tree (e.g. „110“) Node „110“ needs a contact in 10-(*) sub-tree (e.g. „100“ Node „100“ forwards the query to destination „101“ 1 2 3

Communication Networks II Peer-To-Peer Networking Note: many Images were taken and adapted from contribution at the book “P2P Systems and Applications” Ed. Steinmetz, Wehrle H(„ my data “) = 3107 2207 7.31.10.25 peer-to-peer.info 12.5.7.31 95.7.6.10 86.8.10.18 planet-lab.org berkeley.edu 2906 3485 2011 1622 1008 709 611 61.51.166.150 ?

Structured Overlay Networks Principle peers and objects have identifiers objects are stored on peers according to their ID distributed indexing points to object location Lookup / Addressing: Retrieve the object which is identified with a given identifier . 3. P2P com-munication. Get Contents 2. “Routing” to / Lookup of desired Object 1. Publish contents at responsible Peer

Structured Overlay Networks Principle peers and objects have identifiers objects are stored on peers according to their ID distributed indexing points to object location Lookup / Addressing: Retrieve the object which is identified with a given identifier .

Unstructured centralized P2P networks Simple strategy: Central server stores information about locations  Node A (provider) tells server that it stores item D  Node B (requester) asks server S for the location of D  Server S tells B that node A stores item D  Node B requests item D from node A Node A Server S “ A stores D ” Node B The content of this slide has been adapted from “Peer-to-Peer Systems and Applications”, edt. By Steinmetz, Wehrle Transmission: D  Node B  “ Where is D ?”  “ A stores D ”  “ A stores D ” 

Unstructured distributed P2P networks Fully Decentralized Approach No information about location of data at intermediate systems Necessity for broad search  Node B (requester) asks neighboring nodes for item D  -  Nodes forward request to further nodes (breadth-first search / flooding)  Node A (provider of item D) sends D to requesting node B The content of this slide has been adapted from “Peer-to-Peer Systems and Applications”, edt. By Steinmetz, Wehrle & Transmission: D  Node B “ I have D ?”  “ B searches D ” Node A Node B “ I store D ”              

QuaP2P P2P Tutorial 2006

More Related Content

What's hot (20)

Viewers also liked (12)

Similar to QuaP2P P2P Tutorial 2006 (20)

More from Kalman Graffi (20)

QuaP2P P2P Tutorial 2006

Editor's Notes