Jan Sacha and Jim Dowling
The peer-to-peer paradigm for building distributed systems has become extremely popular and it has received increased attention as more and more novel applications are invented and successfully deployed. The main advantage of the paradigm is that it allows the construction of systems with unprecedented size and robustness, mainly due to their inherent decentralisation and redundant structures. However, the P2P paradigm introduces challenges that are often not dealt with properly in many proposed P2P architectures. Massive scale and very high dynamism makes it impossible to capture and maintain a complete picture of the entire P2P network, consequently, a peer or any other entity is only able to maintain a partial or estimated view of the system. Inherent decentralisation, an open environment, lack of trust and unreliable communication introduce distributed decision making problems. In particular, there is a problem in the database field of how to select the most suitable peers for storing data.
Existing P2P systems, such as Distributed Hash Table (DHT) based approaches, usually assume that all peers are similar and have equal capabilities for maintaining data . For example in Chord  it is assumed that the distribution of resources among the peers is uniform. However, this was shown not to be the case in real-life systems, where the distribution of peer characteristics, such as the number of connections, the uptime, available bandwidth or the storage space, usually exhibit the so called scale-free or heavy-tail property [3,4,5,6]. Systems that do not address the heterogeneity of the environment and do not adapt their structures to the environment often suffer poor performance, especially in the face of high churn rates, i.e., high frequency of peers entering and leaving the system [7,8,9].
The main contribution of this paper is a self-organising neighbourhood selection algorithm, that clusters peers with similar performance characteristics and generates a gradient network topology that helps to solve the problem of dynamic replica placement. We briefly describe an approach for master-slave database replication, a heuristic for master election and a heuristic for replica discovery, which all exploit the properties of the gradient topology in order to improve the stability of the replicas and minimise the overhead for replica maintenance. We evaluate the neighbourhood selection algorithm through simulation and performance measurements.
The rest of the paper is organised as follows. In section 2 we discuss the general requirements for data distribution and replica placement. In section 3 and 4 we introduce the concept of gradient topology and we demonstrate a neighbour selection algorithm that generates the proposed topology. In section 5 an approach for master-slave replication based on the gradient topology is presented. Section 6 contains a detailed description of the simulation settings and the experimental results. Finally, in sections 7 and 8 we discuss the related work and our plans for future work.
When addressing the persistent data requirements for a distributed system, we must decide on where to store the data. There are two extremes; one is to store all data in a centralised server, which introduces scalability problems, and the other one is to partition the data among a set of peers using some indexing scheme, for example a distributed hash table. Many existing P2P systems assume that all peers have identical capabilities and responsibilities, and that the data and load distribution is uniform among all peers [1,2]. However, this approach has a drawback that the use of low bandwidth/stability/trust peers to store data can potentially degrade the performance of the entire network .
To solve this problem, many systems only allow data to be stored on the fastest, highest bandwidth, and most reliable, trusted peers, called superpeers [8,9]. This approach, however, introduces another problem of superpeer election. It is not obvious how to identify and select the superpeers from the set of peers in the network, mainly due to the lack of a global knowledge about the system. Solutions based on flooding potentially require communication with all N peers in the network. Other solutions include hard-wiring them in the system or configuring them manually. However, this conflicts with the assumptions of self-management, decentralisation, and the lack of a central authority that controls the structure of the system. An adaptive self-organising system is preferable, where the peers automatically and dynamically elect superpeers, accordingly to the demand, available resources and other runtime constraints. Alternatively, the system may resign from the two-state distinction between superpeers and ordinary peers and it may assign roles for peers relative to their capabilities.
The selection of peers for replica placement could potentially be based on criteria such as peer stability, available bandwidth and latency, storage space, processing performance, an open IP address and willingness to share resources. Peer availability, or uptime, is especially relevant, since every peer entering or leaving the system introduces extra overhead, due in part to required data transfers or routing structure reconfiguration. The system could also employ a peer reputation model and use it as a criteria for replica placement, for example only the most trusted peers might be allowed to host a replica.
We define a peer's utility as a function of the above parameters. The utility is a critical factor in the algorithm that generates the P2P gradient topology and is used in the replica placement strategy presented in this paper.
In a closed system, where all peers trust each other, it is sufficient that every peer evaluates its own utility level. Neighbouring peers can exchange the utility information without any verification procedure, since trust is assumed. In an open, untrusted environment, a separate mechanism is needed to assure that the utility claimed by the peers corresponds to their actual status. Managing trust in a decentralised system, however, is beyond the scope of this paper and will not be discussed further.
A key principle in our approach is that persistent data is stored by the highest utility peers. This strategy addresses a problem faced by many existing P2P systems, where some data items, especially less popular, are hardly accessible or even lost due to peers' instability or lack of resources such as storage space or bandwidth. In our design, the system tries to maximise data availability, security and the quality of service by placing data replicas on the most reliable, high performance hosts.
We propose a P2P topology, which we call a gradient topology, where the highest utility peers, maintaining persistent data, are highly connected with each other and form a logical core of the network, while the network around the core is composed of other peers considered less performant or less reliable (see Figure 1). Assuming the scale-free, heavy-tailed distribution of resources [4,5,6], a great majority of peers belongs to the latter category. The number of core peers is relatively small, but the amount of contributed resources and stability of core peers is significantly higher than the outer peers. The main advantages of grouping high utility peers in a logical core are the following:
In practice, the core peers act more as servers, while the outer peers act more as clients. The core peers should be well-connected, have high bandwidth and processing power, and should be able to maintain a relatively high number of connections. On the other hand, peers far from the core are not suitable for maintaining replicated data, due to poor reliability, resource constraints or the owner's unwillingness to contribute resources to the system. It is not desirable to place such peers in the core since they would decrease the system's performance.
In order to create the gradient P2P topology and enable the emergence of a stable core, we initially thought of the following neighbour selection rule: two peers may become neighbours if they have similar utility status. Additionally, high utility peers should have more neighbours, since they have more resources available to maintain network connections.
However, our early experiments showed that a greedy selection policy, where peers always select neighbours with the most similar characteristics to their own, leads to high network clustering and in consequence long distances between peers. In some cases the clusters got disconnected and the network became partitioned. Another problem was that the highest utility peers did not always connect to a single core, and sometimes there were multiple clusters of high utility peers separated from each other by a number of lower utility peers.
We improved the algorithm by allowing the peers to connect to other non-similar peers, with the connection probability exponentially decreasing with the difference in peer utility. However, it turned out that a randomised strategy, where a percentage of neighbours was always selected at random, gave the best results. Random connections serve several purposes. First of all, they allow the peers to discover potential neighbours with similar utility level, even in remote regions of the network, which in turn enabled the formation of a single cluster containing the highest utility peers. Random connections thus play a similar role to the exploration in traditional multi-agent systems. Second, random links prevented the graph from being disconnected. As shown in , even a small number of random connections, for example 20 per peer, is sufficient to make the probability of network partitioning negligibly small in practice. Finally, our randomised algorithm has the advantage that it is quite simple and it does not require tuning parameters specific to the deployment environment, such as the network size or average peer connectivity.
We propose a P2P topology, which we call a gradient topology, A peer maintains two independent sets of neighbours, randomly-selected connections, and greedily-selected connections to peers with similar utility status (see Figure 2). New connections are discovered by gossiping, i.e., by randomly contacting already existing neighbours and exchanging with them the lists of connections. It is important to note, that the connections inserted to the random sets are always selected from other peers' random sets, which guarantees that the sets remain random. The details of the algorithm implementation and the simulation settings are described in Section 6.
In this section, we demonstrate how the gradient network topology can be exploited by a master-slave database replication strategy. We present an approach, where the replica placement is based on peer utility, available resources and demand. Due to the information contained in the network structure, the selection of high utility peers suitable for replica placement does not require communication between all peers in the system.
We assume that each peer can potentially create an independent database, and replicate it over some of the peers in the network in order to improve its availability and persistence guarantees. A peer that creates the first copy of a database, which we call the master replica, becomes the database owner. Subsequent replicas of the database hosted by other peers are called slave replicas. The users issue queries to the database that can be resolved by any replica. The owner, and potentially other authorised users, can also update or delete a database. There is only one master replica of a database and it is responsible for handling and synchronising updates. Slave replicas are created automatically, on demand.
We restrict the set of peers that are allowed to create database replicas to those with utility above a replica-suitable threshold. A master replica threshold defines a minimum peer utility to host a master replica, and a slave replica threshold defines a minimum utility level to host a slave replica. It is important to note, that the system does not require any form of consensus between peers on the threshold values, since the thresholds can be determined for each database individually.
In our approach, a peer that already hosts a replica, in particular the master, requests a new replica placement on one of its neighbours when a certain condition is met, for example when the number of user queries exceeds some critical level and a new replica is needed to handle the load. It is crucial in this approach that a peer initiating the replication must be able to find other peers suitable for hosting new replicas, i.e., peers with utility above the slave replica threshold. This should be satisfied in the gradient topology, since all peers storing replicas are clustered in the highly connected core, and share a similar, high utility status. Search messages do not need to be propagated to lower utility neighbours, since all high utility peers are located in the core of the network (see Figure 3).
Replicas are removed on demand, adaptively. When a peer decides that the cost of a replica's maintenance is higher than the cost of handling queries, for example the frequency of queries drops below some threshold value, the replica can be deleted. Alternatively, replicas might be selected for removal using a Least Recently Used strategy as in Freenet .
Database replicas must be synchronised between the master and the slaves after update operations. We add the constraint that an update operation, either a modification, an addition or a removal, must be performed on the master replica, while queries can be handled by any slave. This requirement shouldn't significantly reduce the scalability of the system if the rate of queries is much higher than the rate of updates, which is commonly the case for example in name services or knowledge base systems. If an update is delivered to an ordinary replica, the replica forwards it to the master, and the master propagates the update to all replicas. This guarantees that concurrent updates from different peers are serialised and sent in the same order to all copies of the database, hence, there are no write-write conflicts.
The updates can be propagated either instantaneously, or in a lazy fashion, for example by periodic gossiping, with the latter technique being used to reduce bandwidth consumption and improve scalability at the cost of reduced data consistency. Lazy replica synchronisation can be initiated either by the master or by the slave, for instance after a peer crash or a restart. Thus the system provides eventual consistency of the replicas . The core peers storing replicas should be well-connected, have high bandwidth and processing power, which should enable fast data dissemination and frequent replica synchronisation.
In the P2P gradient topology, peers have relative positions defined by their utility metric. This allows us to develop an efficient election algorithm that doesn't require flooding, as peers can use a heuristic that excludes peers with lower utility, i.e., topologically further from the core, when sending election messages (see Figure 4). This heuristic, however, does not guarantee that the highest utility peer will become a master unless all peers in the core are fully connected. Therefore, we modify our heuristic to provide a directed, gossiping election model. The election initiating peer also sends the election message to a certain number of neighbouring peers with lower utility, but still inside the core, and they can restrict further propagation of the message to only peers with higher utility. Given high enough connectivity between peers in the core, within a certain probability the peer with the highest utility should win the election.
A searching mechanism is needed for peers to discover nearby replicas of a database they request access to. Traditional unstructured systems, such as Gnutella, have used flooding algorithms for finding resources. This approach works well for highly replicated data, however, it doesn't scale in principle. A number of techniques have been proposed to improve search in unstructured P2P networks, such as random walk, iterative deepening or routing indices .
For the topology presented in this paper, we are designing a gradient routing algorithm that will be based on two main factors: the neighbour utility information (utility gradient), and heuristic values learned by the system (as for instance in Freenet ). By increasing the probability of forwarding queries to high utility neighbours, the algorithm can effectively route queries towards the core of the network.
We evaluated our approach by modelling a P2P network in RePast, a multi-agent simulation toolkit for large-scale systems. The simulation was implemented in Java. A network was created consisting of over 100,000 peers, which we believe is sufficiently large to model realistic large-scale applications. The experiments ran on a Pentium 4 machine with a 3GHz processor and 3GB main memory.
The simulation is based on a discrete time model and all events are executed in discrete time steps. The actions performed by peers are synchronous, however, our algorithm does not rely on the peer synchrony and hence the results obtained in the experiments can be generalised for asynchronous environments as well. Following the assumptions of decentralisation and a lack of a global view of the system, each peer maintains a limited number of neighbours and performs actions using local knowledge only. We assume also a scale-free distribution of resources with the maximum number of peer connections following the power-law (Pareto) distribution, starting from 10 connections and reaching about 50 neighbours for the most powerful peers. Similarly, the utility distribution follows the power-law. We model the network growth by adding new peers at each step of the simulation, where we start with a network containing only one peer, and at each time step the size is increased by 1 percent. Additionally, at each step a number of peers are removed, which models random failures or peer departures, with the probability of a peer departure being inversely proportional to its utility. Bootstrapping of the system is implemented with a centralised name repository containing the 50 most recent peer names, where peers are added and removed in FIFO order. Figure 5 shows the pseudo-code of the simulation.
Figure 6 shows the randomised algorithm performed by each peer at every step of the simulation. A peer maintains two independent sets of neighbours, randomly-selected connections, and greedily-selected connections to peers with similar utility status. The latter set is twice as large as the former. At every step, if the number of neighbours of a peer reaches the maximum, the peer drops a random connection, and attempts to establish a new one by gossiping with a neighbour.
Figure 7 presents a visualisation of a sample network consisting of 200 peers, generated by the neighbour selection algorithm, where we can see that the topology evolved to the desired form where the highest utility peers are clustered together and constitute a stable core.
Figures 8 and 9 show the measurement results obtained from our simulation. We can see that the average path length between peers is relatively low (about 5-6 hops for 100,000 peers), and that it varies with peer utility. The average distance between the highest utility peers is lower than the average distance between lower utility peers. This confirms our observation that the highest utility peers are highly connected with each other and form a stable core of the network.
Most existing P2P systems that exploit the heterogeneity of the environment are based on structured P2P overlays. In Chord  it has been noted that a single physical peer can host multiple virtual peers, and therefore, the heterogeneity of resources in a DHT network can be addressed by assigning more virtual peers to higher performance hosts. OceanStore  proposed to elect a primary tier ``consisting of a small number of replicas located in high-bandwidth, high connectivity regions of the network'' for the purpose of handling updates, but no specific algorithm for the election of such a tier is presented. Mizrak et. al.  proposes a super-peer based approach for the exploitation of resource heterogeneity, however, unlike our architecture, it relies on a DHT overlay.
In the field of unstructured P2P networks, there has been a lot of work devoted to searching (e.g. ) and to replication strategies , but there has been limited research on network topology generation and peer neighbourhood selection algorithms. Yang and Molina  investigate general principles of superpeer-based networks and give practical guidelines for the design of such networks, however, they don't describe any specific algorithms for super-peer election or network structure maintenance. The closest to our approach is the work of Jelasity, in particular his research on decentralised topology management (T-Man ), however, he doesn't address the problem of reliable peer discovery and dynamic replica placement in an open P2P system.
This paper describes the general requirements for data replication in peer-to-peer environments. We have proposed a gradient network topology where the highest utility peers are highly connected with each other and form a logical core suitable for maintaining data replicas. A self-organising neighbourhood selection algorithm has been presented that generates the proposed gradient topology by clustering peers with similar uptime and performance characteristics. The algorithm has been evaluated through simulation, and measurement results have confirmed that the approach is scalable and robust.
The main advantage of the gradient topology is that the network structure contains information about the peer utility, which allows the peers to discover other high utility peers without flooding the entire network with search messages. The gradient topology allows the search space to be limited to a small subset of all peers in the system. This property allows us to address the problem of dynamic replica placement, master replica election, and replica discovery in an open, decentralised environment. The gradient topology should also reduce the cost of replica maintenance, since peers storing data replicas are located close to each other and are connected by stable, high-capacity routes.
Our project is still at an initial stage and requires a lot of further research. We are planning to develop a heuristic routing mechanism based on peer utility gradient, which will allow peers to discover database replicas in their proximity. We are also going to evaluate the proposed replica placement strategies and the master election algorithm. We are building a prototype implementation based on the Berkeley DB middleware.
This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.70)
Copyright © 1993, 1994, 1995, 1996,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.
The translation was initiated by Jan Sacha on 2005-11-21