Jan Sacha, Jim Dowling, Raymond Cunningham, and René Meier
Distributed Systems Group,
Department of Computer Science,
Trinity College, Dublin
Measurements on deployed peer-to-peer (P2P) systems show that the distributions of peer characteristics such as the uptime, bandwidth, or available storage space, are highly skewed and often follow heavy-tailed distribution [1,2,3]. Researchers have also reported that the use of low stability or low performance peers can lead to poor performance in a P2P system [4,5]. Consequently, in order to improve the system efficiency, many P2P systems attempt to assign most important services to selected, high capability peers, called super-peers [6,7,8,9,10].
However, super-peer election in P2P environments poses a number of difficult problems. Due to the massive scale, dynamism, and complexity of P2P systems, it is not feasible for a peer or any other entity to maintain a global view of the system. Inherent decentralisation of P2P environments introduces uncertainty in decision making. Traditional election algorithms, such as the Bully algorithm , and other classical approaches to group communication , potentially require communication with all peers in the system, and can only be applied to small clusters of peers. Other approaches to super-peer discovery, based on flooding or random walking , are difficult in large P2P systems due to high communication overhead that increases as the size of the P2P system grows. Solutions based on manual or static configuration of super-peers are inappropriate due to a lack of system-wide knowledge of peer properties.
This paper proposes a decentralised and self-managing approach to super-peer discovery based on gossipping. The paper introduces a peer utility metric in section 3 and applies a decentralised aggregation algorithm, covered in section 4, that generates a utility histogram in order to estimate the distribution of peer utility in the system. In section 5, a self-organising gradient topology is constructed based on the utility metric that allows peers to apply an efficient search heuristic for super-peer discovery described in section 6. The utility histogram is used for adaptive super-peer criteria selection. The presented approach is evaluated in section 7 and this evaluation shows that the aggregation algorithm provides a good approximation of peer utility distribution and that the search heuristic based on the utility metric achieves high search performance (significantly better than random walking). Section 8 concludes the paper.
Recent research on P2P systems has been primarily focused on Distributed Hash Tables [14,15,16,17], where the main goal is to provide efficient routing between any pair of peers. In our approach, we are focusing on searching for peers with particular properties in the system (high utility), and assuming that system services are placed on these peers, we provide a mechanism that allows the efficient discovery and consumption of these services. Furthermore, DHTs assume that peer identifiers are unique and relatively static, uniformly distributed in a key space. In our approach, the utility is dynamic and may follow any distribution with multiple peers potentially having the same utility value.
A number of P2P systems based on super-peers have been proposed. Yang and Molina  investigate general principles of designing super-peer-based networks, however, they do not provide any specific super-peer election algorithm. OceanStore  proposes to elect a primary tier ``consisting of a small number of replicas located in high-bandwidth, high connectivity regions of the network'' for the purpose of handling updates, however, no specific algorithm for the election of such a tier is presented. Brocade  improves routing efficiency in a DHT by exploiting resource heterogeneity, but unlike our approach, it does not address the super-peer election problem.
Chord [14,10] shows that the load between peers can be balanced by assigning multiple virtual servers to high performance physical hosts. The DHT structure may be used for the discovery of under- or over-loaded peers using random sampling, distributed directories, and other similar techniques. Mizrak et al  proposes the use of high capacity super-peers to improve routing performance in a DHT. However, these systems focus on load balancing in a DHT rather than the selection of potential super-peers from the set of all peers in the system.
Montresor  proposes a protocol for super-peer overlay generation, however, unlike our gradient topology, his topology maintains a discrete (binary) distinction between super-peers and client peers. In contrast, our approach introduces a continuous peer utility spectrum and approximates the distribution of peer utility in the system in order to discover peers above an adaptive utility threshold. Our neighbour selection algorithm can be seen as a special case of the T-Man protocol  that generates a gradient topology, where the ranking function is based on our peer utility metric. The advantage of such a utility ranking function is that applications can exploit the constructed topology in order to elect appropriate super-peers.
Kempe et al  describes a push-based gossip algorithm for the computations of sums, averages, random samples, and quantiles, and provides a theoretical analysis of the algorithm. Montresor, Jelasity and Babaoglu [21,22] introduce a push-pull-based approach for aggregate computation, however, their model assumes that message exchange between any two peers is atomic and that the clocks of peers are synchronised. We have extended Kempe's algorithm to calculate histograms, and we have added a peer leave procedure that improves the behaviour of the algorithm in the face of peer churn. We are using the aggregates for adaptive super-peer threshold calculation.
This section introduces peer utility as a metric that captures the application-specific requirements and measures the capability of a peer to become a super-peer. Depending on the domain, the utility metric may involve a number of parameters. For example, in a P2P storage system, the utility may place greater emphasis on a peer's available local storage space and bandwidth. In a multimedia streaming application, the utility may combine a peer's latency and bandwidth, while in a grid computing system the utility may be a function of a peer's CPU load and availability.
A simple approach to utility calculation would be for each peer to individually compute their own utility. A more sophisticated utility metric may involve feedback from neighbouring peers. In either case, the utility of a peer is a local or microscopic property of a peer (or neighbourhood of peers). In an untrusted environment, a decentralised approach to trust may be adopted to prevent malicious peers from providing fake utility information about themselves.
Given that the utility of each peer in the topology can be calculated by a common function , the selection of super peers then becomes a question of how can an individual peer discover a high utility peer. In one possible approach, a peer may search for a super-peer above an absolute utility value. However, in many other applications, before a peer attempts to discover a high utility peer, the peer needs to estimate the distribution of peer utility in the system in order to know what constitutes high utility in a running system. For example, if an application requires the selection of the most stable peers in the system, it needs to learn the peer stability characteristics before it can decide on the stability threshold for a super-peer. Otherwise, if the super-peer threshold is static (hardwired), it may happen that no peer in the system satisfies the criteria, or that the threshold is very low and hence sub-optimal. Moreover, due to the system's dynamism, the super-peer selection criteria has to be continuously adapted to the system's current state and peer availability.
In the remainder of this paper, we describe a set of algorithms that provide solutions to the problems highlighted above. A decentralised aggregation technique is shown that allows peers to estimate the distribution of peer utility in the system and from this to identify an adaptive super-peer selection threshold. The gradient topology and gradient search heuristic are shown that enable the efficient discovery of (super)peers above a given utility threshold.
Our approach to aggregation is based on the algorithms described by Kempe  and Montresor . We adopt a push-based gossip model, since it can be implemented using asynchronous message passing and does not require synchronisation between peers.
In our approach, each peer continuously maintains estimates of a number of system properties by gossipping with neighbours. A peer, , has an estimate of the current number of peers in the system, , the maximum peer utility in the system, and a cumulative histogram of peer utility values, . Each of these values approximate the true system properties , , and . The cumulative utility histogram with bins of width is defined as. Parameter is also called the histogram resolution. The histogram is a discrete approximation of the peer utility distribution in points, where each bin corresponds to a single point of the distribution function.
Peers joining the network contact any peer already in the system and obtain an initial set of neighbours and a current approximation of , , and . A newly joining peer has minimum utility, which is zero, and the maximum utility of any peer is unbounded. The number of histogram bins, , is constant in the algorithm.
Peers periodically execute a gossip-based algorithm, where at one step (or round) of the algorithm a peer can send (push) messages to a number of neighbours and receive messages sent by its neighbours in the previous round. A sequence of steps that leads to a new approximation of , , and is called an aggregation epoch. An epoch can be potentially initiated by any peer at any time step, and the information about the newly created epoch is gradually propagated to all peers in the system. In order to distinguish between different, possibly overlapping, epochs, each epoch is tagged by a unique identifier selected by the initiating peer. Every peer maintains a cache that stores the identifiers of aggregation epochs that this peer has participated in. The duration of an epoch is delimited by a time-to-live value. At the end of an epoch, every peer updates its estimates , , and .
The algorithm performed at each step by a peer is shown in Figure 1. In line 1, peer starts a new aggregation epoch with probability . Thus, a new epoch is started by the system with average frequency (every time steps). The epoch is initiated by creating an aggregation message with a new epoch and a weight , as specified by Kempe. The field is initialised with an value, since informally speaking, the propagation speed of push-based epidemics is exponential and requires only steps with high probability to reach all peers in the system . The histogram bin width is calculated as . Furthermore, aggregation messages include a field used to estimate labelled , a field used to estimate labelled , and finally, a histogram, , consisting of entries representing individual histogram bins. By combining all aggregation information in one message, the algorithm reduces the total number of messages generated, and thus limits the network traffic generated. For a 100-bin histogram, the aggregation message size is below 1KB.
In lines 2-12 of Figure 1, a peer performs the aggregation of received messages. A peer that receives an aggregation message with a new epoch identifier, i.e., with an field that is not stored in the cache, joins this new aggregation (lines 13-20) by adding the value of to its field and to all histogram bins according to formula (1). If the value is less than (indicating the end of the epoch), a peer updates its current estimates of the system properties (lines 21-25). Otherwise, the peer emits a message to a random neighbour and to itself so that this peer will continue to participate during the next aggregation round (lines 26-30).
The algorithm exhibits the property of mass conservation defined by Kempe  provided that no peers fail during an aggregation epoch. At any time step, for each aggregation epoch, the sum of the weights of all aggregation messages in the system is always equal to one, i.e., . Furthermore, the sum of fields of all messages is equal to the number of peers participating in the aggregation, the maximum of fields is equal to the maximum utility among peers participating in the aggregation, the average value of fields of all messages at subsequent rounds decreases by one, and for , , where is the utility histogram for peers participating in the aggregation.
In order to ensure mass conservation, each peer leaving the system is required to perform a leave procedure shown in Figure 2. In lines 1-11 of this figure, a peer aggregates currently buffered messages (as in lines 2-12 of Figure 1). In lines 12-15 of Figure 2, the peer subtracts the value of from the field and from the histogram bins. Finally, in lines 16-18, the peer sends a message containing the aggregated values to a random neighbour.
During one round of the aggregation algorithm, each peer participating in an epoch generates one aggregation message. The epochs are initiated on average every rounds (frequency ), and since each epoch lasts on average rounds, the average number of aggregation messages generated and received by each peer in one round is bounded by , or if is .
In this section, we introduce the self-organising gradient P2P topology and we outline its main properties. We briefly discuss the neighbour selection algorithm that generates the gradient topology. The topology is exploited by the gradient search heuristic.
The gradient topology is a P2P topology where the highest utility peers are connected with each other and form a core in the system, while lower utility peers are located gradually farther from the core. The core, which clusters the highest utility peers in the system, corresponds to a set of super-peers in the system. Figure 3 shows a visualisation of a gradient topology. The position of each peer in the topology is determined by the peer's utility.
We have designed and evaluated a self-organising neighbour selection algorithm that generates the gradient topology in a completely decentralised P2P environment. Each peer maintains two sets of neighbours, a similarity-based set, , and a random set, . Peers periodically gossip with each other and exchange their sets. On receiving both sets from a neighbour, a gossipping peer selects one entry whose utility level is closest to its own utility and replaces an entry in its similarity-based set. This behaviour clusters peers with similar utility characteristics and generates the gradient structure of the topology. In addition, a gossipping peer randomly selects an entry from the received random set and replaces a random entry in its random set. Connections to random peers allow peers to explore the network in order to discover other potentially similar neighbours. This significantly reduces the possibility of more than one cluster of high utility peers forming in the network. Random connections also reduce the possibility of the gradient topology partitioning due to excessive clustering. Moreover, random connections between peers are used by the aggregation algorithm described in section 4. Peer removes a random entry from or if the number of entries in the sets exceeds the maximum allowed number of connections.
In addition to the neighbour sets, a peer maintains a cache that stores an estimated utility value, , for each neighbour . Entries in the cache are timestamped and peers exchange these entries whenever they gossip.
Our initial evaluation of the neighbour selection algorithm, described in , shows that the algorithm generates a P2P topology with a very small diameter (an order of 5-6 hops for 100,000 peers) and that it has a global gradient structure.
The emergence of a gradient topology is a result of the system's self-organisation. Peers are independent, have limited knowledge about the system and interact with a limited number of neighbours. Utility can be considered as a microscopic property of a peer which enables through peer interaction the construction of the macroscopic gradient structure.
The gradient structure of the topology allows us to develop an efficient search heuristic, called gradient search, that enables the discovery of high utility peers in the system. The algorithm exploits the information contained in the topology to limit the search space to a relatively small subset of peers and to achieve a significantly better search performance than traditional search techniques, such as random walking .
The goal of the search algorithm is to deliver a message from any peer in the system to a super-peer in the core, i.e., to a peer with utility above a certain threshold. The value of the threshold is assigned by a peer that initiates the search and is calculated using the utility histogram generated by the aggregation algorithm described in section 4. The threshold is included in the search message. A peer below the specified utility threshold forwards search messages to higher utility peers until a peer is found whose utility is above the threshold. Each message is associated with a time-to-live (TTL) value that determines the maximum number of hops the message can be propagated.
In gradient search, each peer greedily forwards messages to its highest utility neighbour, i.e., to a neighbour whose utility is equal to
Thus, messages are forwarded along the utility gradient, as in hill climbing and other similar techniques. It is important to note that the gradient search strategy is generally applicable only to a gradient topology. It assumes that a higher utility peer is closer to the core in terms of the number of hops than a lower utility peer.
Local maxima should never occur in an idealised gradient topology, however, every P2P system is under constant churn and the gradient topology may undergo local perturbations from the idealised structure. In order to prevent message looping in the presence of such local maxima, a list of visited peers is appended to each search message, and a constraint is imposed that messages are never forwarded to already visited peers.
In this section, we describe our experimental setup and present the results of our experiments. The experiments evaluate the precision of the aggregation algorithm and the performance of gradient search.
The following notation and metrics are used. We measure the average error in histogram estimation, , defined as
where , , , , and correspond to , , , , and at time of the experiment, is the duration of the experiment, and is a histogram distance function defined as
Similarly, we define as the average error in the estimation of , and as the average error in the estimation of over the course of the experiment.
We compare the performance of gradient search with random walking by measuring two properties of both algorithms. We calculate the average number of hops in which the algorithms deliver a search message from a random peer in the network to a super-peer in the core, i.e., to a peer above a certain utility threshold, and we measure the search failure rate, i.e., the percentage of search messages that are never delivered to super-peers.
The super-peer utility threshold is determined by each peer individually
using the utility histogram calculated by the aggregation algorithm.
A peer, , sets the threshold, , to a value that corresponds
to 1% of highest utility peers. This value is approximated using
the following formula
where is the histogram width and is the number of bins in the histogram.
We evaluate the aggregation and search algorithms in a discrete event simulator. An individual experiment consists of a set of peers, connections between peers, and messages passed between peers. We assume all peers are mutually reachable, i.e., any pair of peers can establish a connection. We also assume that it takes exactly one time step to pass a message between a pair of connected peers. We do not model network congestion, however, we limit the maximum number of concurrent connections per peer. In order to reflect network heterogeneity, we limit the number of peer connections according to the Pareto distribution (power law) with an exponent of 1.5 and a mean of 24 connections per peer.
The simulated P2P network is under constant churn. Every new peer is assigned a session duration, , according to the Pareto distribution with an exponent of and minimum value . Thus, the expected session duration is given by the formula . We calculate the churn rate as the fraction of peers that leave (or join) the system at one step of the simulation. Over the lifetime of a running system, the average churn rate, , is equal to the inverse of the expected peer session time , therefore, in order to simulate a churn rate, , in the system, we set to
We assume that 10% of peers leave the system without performing the leave procedure of the aggregation algorithm (i.e., they crash).
A central bootstrap server is used that stores the addresses of peers that have most recently joined the network. The list includes ``dangling references'' to peers that may have already left the system. Every joining peer receives an initial random set of 20 neighbours from the bootstrap server. If a peer becomes isolated from the network (i.e., has no neighbours), it is bootstrapped again. The bootstrap server executes the aggregation algorithm and provides initial estimates of , , and , for peers entering the system.
We start each individual experiment from a network consisting of a single peer. The number of peers is increased by one percent at each time step, until the network grows to the size required by the experiment. Afterwards, the network is still under continuous churn, however, the rate of arrivals is equal to the rate of departures and the number of peers in the system remains constant. Each peer continuously performs the neighbour selection and aggregation algorithms at every time step after it is bootstrapped. Additionally, at each turn, a number of randomly selected peers emit search messages that are routed using gradient search or random walking.
For the purpose of the simulation, in all experiments, the number of bins in the utility histogram is 100 , the aggregation frequency parameter is (except figure 4), and is set to hops. The utility function of a peer with uptime and maximum connections with neighbours is defined as .
Figure 4 shows the average precision of estimation as a function of time and compares the results obtained for three different values of . The best approximation, close to , is obtained for . Random fluctuations are visible.
Figures 5 and 6 show the average error of the aggregation algorithm, , , and , as functions of the churn rate and network size. The variance is not shown as it is approximately two orders of magnitude lower than the plotted values. The churn rate is measured as the number of substituted peers per time step. The estimation of is the most precise as the algorithm for the maximum calculation is simpler compared to the algorithm for and estimation. approximation is less accurate than since the histogram changes more dynamically than the number of peers. The relative error as a function of the number of peers is bounded as the number of rounds in the epoch is proportional to , which corresponds to the theoretical analysis of Kempe.
Figure 7 shows the average hop count of search messages delivered to peers above the utility threshold as a function of the number of peers in the system. The hop count is nearly constant since the percentage of high utility peers in the system is fixed (1% of the system size). Figures 8 and 9 show the average failure rate when searching for peers above the utility threshold as a function of the number of peers in the system and as a function of peer churn rate. All three figures demonstrate superior performance of gradient search over random walk.
In this paper we have shown that the combination of a peer utility metric, aggregation techniques, and gradient topology with gradient searching allows the discovery of super-peers in peer-to-peer environments. Decentralised aggregation techniques reduce the uncertainty about the system by approximating peer utility distribution, and enable the decentralised and adaptive calculation of super-peer utility thresholds. The neighbour selection algorithm used in the gradient topology allows peers to self-organise themselves and to create a system-level gradient structure based on a local peer utility metric. The information contained in the topology enables the efficient searching for (super)peers above a given utility threshold.
This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.70)
Copyright © 1993, 1994, 1995, 1996,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.
The translation was initiated by Jan Sacha on 2006-07-04