Categories and Subject Descriptors: A.1 [General Literature]: Introductory and Survey; C.2.4 [Computer-Communication Networks]: Distributed Systems–Distributed applications; H.1.1 [Models and Principles]: Systems and Information Theory–General systems theory; H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval–Clustering; information filtering; relevance feedback; search process; selection process; I.2.11 [Artificial Intelligence]: Distributed Artificial Intelligence–Coherence and coordination; intelligent agents; multiagent systems
General Terms: Algorithms, Design, Management, Performance, Reliability
Additional Key Words and Phrases: Adaptive systems, Complex systems, MANET, Peer-to-Peer, Self-Organisation
ACM Reference Format:
Biskupski, B., Dowling J., and Sacha, J. 2007. Properties and mechanisms of self-organizing MANET and P2P systems. ACM Trans. Autonom. Adapt. Syst. 2, 1, Article 1 (March 2007), 34 pages DOI = 10.1145/1216895.1216896 http://doi.acm.org/10.1145/1216895.1216896.
Designers of distributed systems have recently turned to decentralised mechanisms to build large-scale systems for dynamic network environments, such as Mobile Ad Hoc Networks (MANETs) and Peer-to-Peer (P2P) networks, as traditional approaches to system design based on global knowledge or strict consensus among entities are no longer viable [4, 24]. However, the complexity of decentralised control increases rapidly with system size and dynamism, and researchers have turned to self-organisation as a guiding principle to construct such systems. Self-organisation offers the promise of providing new ways to build distributed systems from massive numbers of low cost hosts that interact and adapt to produce properties such as self-management capabilities, robustness and adaptability to a dynamic environment.
The properties and constraints on mechanisms that are required for a system to be described as self-organising with emergent properties have been, in our opinion, most clearly defined in the field of biology by Camazine et al. as “a process in which pattern at the global level of a system emerges from the numerous interactions among lower-level components of the system. Moreover, the rules specifying the interactions between the system’s components are executed using only local information, without reference to the global pattern” . This definition captures important aspects of self-organisation such as autonomous components that take decisions using only local information, and how interactions between components cause system properties to emerge. However, this definition does not say anything about how the system or its constituent components interact with the system’s environment, or how interaction with an external environment can be used to guide self-organisation to produce macro-level structures .
There has been extensive research on self-organisation in biological, social and physical systems [12, 47, 34, 2], producing a common set of concepts with which we can discuss these systems. We use these concepts to propose an agent-based model of self-organising MANET and P2P systems and inform a review of three selected network systems. We use this model to describe the properties of the reviewed systems and the mechanisms used to establish those properties. As part of our model, we identify several design constraints and guidelines for engineering self-organising applications in the target network environments. While we only analyse self-organising MANET and P2P systems in this paper, we believe that our model can be extended to a general class of self-organising distributed systems. Our model and review contrast with previous research that examined the application of theories of self-organisation to algorithms and computer applications, such as in [59, 16, 4], in that we evaluate how self-organisation is used in these systems to improve their operation by adapting to a dynamic environment.
The rest of the article is organised as follows. In Section 2, we motivate the use of self-organisation for building distributed systems by discussing the limits of existing consensus-based techniques, for increasing systems size and dynamism. In Section 3, we provide brief reviews of the systems AntHocNet, SAMPLE and Freenet that we will use in the rest of the paper to show how our model is realised. In Section 4, we describe the model of autonomous agents, their partial view of the system and the influence of the environment on the system design. In Section 5, we formalise how self-organising MANET and P2P systems, and the reviewed systems in particular, can improve desired system properties, such as network throughput, by adapting their structure and behaviour to their application, network and host environments. Finally, we conclude and give some directions for future work.
Traditionally, distributed systems designers have used techniques such as centralised state, group communication protocols , dynamic software architectures , tuple spaces  and interaction protocols  to coordinate the behaviour of system components to produce desirable system behaviour. These techniques all rely on some type of global knowledge that can be characterised as a synchronised view among a set of participants on the value of some shared state.
In decentralised systems, global knowledge of the system includes the local state and local environment of every component in the system. Components can coordinate their behaviour to improve desirable properties of the system, if they are able to reach consensus (agreement) on the state of the system and its environment. A number of different consensus models have been developed that provide components with different levels of consistency on the state of the system and its environment at any instant in time. Strict consensus models, based on components viewing a total ordering of the updates to the system state, provide components with strong consistency guarantees on the state of the system, but they do not scale for a large number of participants due to the number of messages that need to be sent between them. For many applications strict consensus requirements can be loosened and techniques such as virtual synchrony , based on causal ordering of system events, can be used, enabling systems to scale better. However, Van Renesse states in  that: ”traditional consensus protocols [...] have costs linear in system size [...]. With as few as a few hundred participants, such a solution would break down.”. More recently, eventual consistency models have become widely adopted , such as those based on system-wide gossiping , and gossiping within partitions in a statically partitioned system .
The general trend in designing larger distributed systems has been to develop consensus protocols that reduce the number of messages that need to be delivered to all components in the system to synchronise on a shared view, but at the cost of lessening consistency guarantees. However, the aforementioned consistency models still provide a system-wide view and require that messages be eventually sent to all participants, even if those messages are not relevant to all of them. For dynamic environments, such as P2P networks and MANETs, two costs associated with this approach increase with system size: the time required to propagate a changed view to all participants and the frequency with which the view is updated. As systems in these environments scale, they have increased node churn rates , link state changes, and more application data to process . Other decentralised techniques such as self-stabilising algorithms  define global system states (correct, safe, and erroneous states) and components attempt to stabilise the system within a bounded number of steps, i.e., eventually achieve consensus that the system is in a correct state . Examples of self-stabilising algorithms include routing algorithms , e.g., Routing Information Protocol (RIP) and Open Shortest Path First (OSPF), but these algorithms break down in highly dynamic environments .
In effect, as systems increase in size and dynamism it becomes increasingly difficult for systems based on either eventual consistency or self-stabilising algorithms to adapt and improve their operation to an unpredictable environment. However, in large-scale systems operating in dynamic environments, localised groups of components are still able to establish weak consensus on the state of a relatively small part of the system. Techniques for establishing localised consensus can reduce the amount of information propagated in the systems, compared to system-wide gossiping, without the need to statically partition or group components using global knowledge.
We focus our review of self-organising distributed systems on systems from MANET and P2P domains. MANET and P2P systems exhibit many common properties that justify describing them in one paper. Common properties include a dynamic network environment with high levels of node churn, changing network link quality and variation in the amount of resources nodes contribute to the system. These properties have led to the development of techniques that allow decentralised coordination of entities and their continuous adaptation to a changing network environment.
The criteria we used to select systems for this review includes systems that: consist of autonomous interacting entities; have no use of global knowledge (allowing for the bootstrap problem in P2P systems); have a dynamic environment; and adapt to changes in their environment to improve some desirable system property. P2P systems that do not meet these criteria, but have stimulated interest in self-organisation in distributed systems, include: systems based on relatively static structures that do not evolve as the environment changes, such as Distributed Hash Tables (DHTs) [60, 49, 53, 40]; super-peer architectures based on manual or centralised super-peer election approaches [70, 72]; peer-to-peer multicast systems based on centralised multicast tree management algorithms ; and hierarchically partitioned gossiping systems (such as Astrolabe ). Many of the existing MANET routing protocols, which can be characterised as reactive, zone-based and cluster-based , do not meet our criteria. In the related area of sensor networks, existing systems that describe themselves as self-organising incorporate centralised entities such as cluster-heads or sink nodes, and are therefore not considered here .
While much of the discussion on self-organisation and our later model is applicable to distributed systems, in general, we restrict our review to three systems in P2P and MANET domains that meet our criteria, as they consist of decentralised, autonomous entities that operate in a dynamic environment, adapt their operation to changes in their environment, and have emergent properties, such as localised consensus between entities. The reviewed systems are diverse as they operate in different network environments (MANET and P2P), use different algorithms and adapt to different properties of these environments. Other existing systems that meet our criteria, but are not reviewed, include the SG-1 self-organising protocol for construction and maintenance of super-peer overlays , the T-MAN gossip-based protocol for the topology management  and the AntNet mobile-agent-based routing protocol . SG-1 and T-MAN are not reviewed as their self-organising mechanisms aim at achieving system-wide eventual consistency, while we are more interested in localised consistency mechanisms. AntNet is similar to MANET routing protocols reviewed in this paper.
MANETs are a class of wireless network where mobile nodes interact and communicate with each other in an ad-hoc manner. MANETs lack any fixed infrastructure, and in routing protocols for MANETs all nodes have a routing agent that forwards packets over multiple hops, hopefully towards their destination. P2P systems refer to a general class of wide area networked systems that use distributed resources to provide some service in a purely decentralised manner.
The network environments of MANETs and P2P systems have different underlying capabilities that can be used to design self-organising behaviour. MANETs provide network broadcast functionality and promiscuous listening on the shared network medium, allowing the easy propagation of information between nearby hosts and the discovery of new nodes in a system. In contrast, P2P systems are typically built as IP overlay networks that define some communication abstraction over the IP network. IP overlay networks offer no broadcast or promiscuous listening capabilities, and centralised or well-known name services are typically used to overcome the bootstrap problem. However, P2P systems enable an agent to directly communicate with any other agent, allowing systems to adapt their topologies arbitrarily. MANETs, in some ways, are a more challenging network environment as no single agent can be used globally to coordinate system behaviour.
The reviewed systems use approximate optimisation algorithms to solve optimal path problems (paths to destination nodes in the MANET routing and paths to files in the P2P system). The algorithms used are approximate optimisation algorithms as they only attempt to provide a good enough solution to a problem in reasonable time, whereas optimal algorithms strive for convergence on a global optimal solution . However, the reviewed systems deal with problems not typically addressed by optimisation algorithms, including: the concurrent solution to many different optimisation problems, the loss of solution parts by agents leaving the system, the discovery of new solutions as agents join the system, and solution cost generation using measurements taken at the agents’ local network environments.
We discuss the reviewed systems as examples of multi-agent systems, where an agent is an autonomous entity that is “situated in some environment, and that is capable of autonomous actions in this environment in order to meet its design objectives” [68, 38], following the Wooldridge and Jennings definition.
AntHocNet  is a self-organising routing protocol for MANETs. AntHocNet belongs to a group of network routing protocols based on the Ant Colony Optimisation (ACO) meta-heuristic , which in turn was inspired by the observed behaviour of ants foraging for food in ant colonies. Systems based on ACO are modelled as a finite set of nodes (components), connections between the nodes and a population of ants wandering between the nodes and laying a volatile substance called pheromone that evaporates over time. Ants wander randomly in the absence of pheromone trails at a component, whereas if pheromone trails are present on connections, ants preferentially traverse the connection with higher pheromone intensity. Ants can add pheromone to a connection as they traverse it, generally adding more pheromone if the path is shorter or of higher quality. This form of indirect communication between ants about the path quality is called stigmergy. When ants follow the simple behaviour described above, the colony as a whole, with high probability, can solve the problem of finding the shortest path from a set of paths between a pair of nodes. This is achieved by many ants concurrently and randomly exploring their environment, and laying pheromone more frequently and in higher amounts on shorter paths, which in turn attracts more ants to those paths and creates a positive feedback loop of pheromone accumulation. The majority of initially random ants converge onto the shortest paths with the highest pheromone concentration. These paths, called pheromone trails, are emergent structures generated by ants exploring the environment and indirectly interacting with each other. The pheromone trails evaporate over time, allowing the colony to explore new paths over time.
The ACO meta-heuristic was successfully applied to many routing algorithms where separate packets, called ant packets, are proactively sent into the network to continuously sample possible paths and update routing (pheromone) tables at nodes in the system. Ant-based routing protocols include ABC  and AntNet  that were designed for wired networks, as well as the more recent AntHocNet for MANETs. Ant-based routing algorithms are suitable for MANETs due to their decentralised nature, high robustness to node failures, load balancing and adaptability to highly dynamic environments. However, the use of only proactive ant packets to discover optimal routes led to problems in previous work on applying ACO to MANET routing in PERA . It has been shown that reactive protocols for MANETs, such as Ad Hoc On-Demand Distance Vector routing (AODV) , demonstrate better performance in MANETs, mainly due to the high network dynamism .
AntHocNet adapted the proactive nature of the ACO meta-heuristic to build a hybrid protocol, which combines reactive route set up with proactive route maintenance. Each node maintains a routing table, where for each known destination, an entry consists of a vector of real-valued elements, called pheromone values, with one element for each neighbour. A neighbour is defined as a node within wireless communication range. A pheromone value is an estimation of the quality of the route to the destination over a particular neighbour. Pheromone values are continuously updated according to the path quality values calculated by reactive and proactive ants. When a node wishes to send a data packet to a destination that is not in the local routing table, a reactive ant packet is sent to discover and set up an initial path to the destination. The reactive ant packet is broadcast, hence, all nodes within wireless broadcast range, i.e., the node’s neighbours, receive it and if their routing tables contain the destination, they forward the reactive ant packet to the next hop with a probability proportional to its pheromone value. If the destination is unknown again, a node re-broadcasts the reactive ant packet as before. This procedure is repeated until a reactive ant packet reaches the destination, whereupon it is converted to a backward ant packet that retraces its path to the source along the same path, updating pheromone values in routing tables. Routing tables are updated using a combination of the locally sensed size of the packet queue and the estimated time required to route a packet to the destination. Pheromone values are updated gradually, biasing their values towards the newly computed ones. If the backward ant cannot be delivered to the sending node, for instance due to node movements, the sending node reaches its maximum number of broadcasts and the maximum waiting time for the backward ant, assumes that the destination is unreachable, and discards all buffered packets to this destination.
Once the paths are set up by possibly different reactive ants that reached the destination, data packets can be routed to the destination (see Figure 1). Data packets are unicast probabilistically in a similar manner to the reactive ant packet, but with a lower probability of selecting suboptimal paths, i.e., every node that receives a data packet selects the next hop with a probability proportional to the next hop’s pheromone value. When a data session has been established between nodes, the source node periodically sends out a proactive forward ant packet at a rate of one for every few data packets. The proactive forward ant is probabilistically unicast to the next hop, using the same formula as reactive forward ants, but there is also a small probability of a broadcast instead of a unicast in order to discover new nodes and paths in the network. As in the case of reactive ants, on the way back to the source node, proactive ants update pheromone values along the path with the quality values collected by forward ants. In addition to these proactive routing features, nodes periodically broadcast the pheromone values in their routing tables to all their neighbours. This enables the diffusion of pheromone values throughout the network, independent of ant activities. Furthermore, nodes use periodic broadcast to notify each other that they are alive.
In simulations based on Qualnet , the AntHocNet algorithm has been shown to be superior to a state of the art MANET routing protocol, AODV, in terms of packet delivery ratio, average end-to-end delay and average jitter, whereas it is less efficient in terms of routing overhead since it requires many control messages to be sent. AntHocNet produces superior performance to AODV because feedback between different ant packets on route quality produces localised consensus between nodes, in the form of pheromone trails, on the best routes in the system, enabling nodes to share in the work of exploration the network environment for better routes. In contrast, AODV is a reactive protocol that does not learn route quality. The performance of AntHocNet demonstrates the potential for self-organising approaches to building distributed systems in MANETs.
Similar to AntHocNet, SAMPLE  is a self-organising routing protocol for MANETs. Its design is based on Collaborative Reinforcement Learning (CRL) , an extension to Reinforcement Learning (RL)  with support for online multi-agent learning. Collaborative Reinforcement Learning (CRL) is designed to coordinate the solution to discrete optimisation problems (DOPs) in a multi-agent system in order to optimise desirable system properties, such as system throughput or robustness. A CRL system is a decentralised system of partially connected RL agents, where any agent can potentially initiate a DOP that is solved by some (possibly different) agent in the system. Each agent uses a local model of both its environment and its neighbours to attempt to minimise the cost in solving a DOP. A DOP is solved by an agent either by executing an action to solve the DOP locally or, if the estimated cost is lower, an action to delegate the DOP’s solution to a neighbour. Agents improve their local model using both evaluative feedback on the success of past actions to solve the DOP, and by acquiring environmental feedback from the agent’s application, host and network environment, e.g., using monitoring subsystems as found in autonomic systems . Agents also provide one another with collaborative feedback about their estimated cost of solving different DOPs, enabling agents to coordinate the solution to DOPs and also optimise desirable system properties.
SAMPLE is based on CRL and for each different known destination agents model a routing decision as solving the DOP of finding the (neighbouring) agent with the lowest estimated cost of routing a data packet to that destination. SAMPLE is a purely reactive protocol, in which routing tables with estimated costs for different destinations are updated when users provide data packets to be routed in the network. In contrast to AntHocNet, SAMPLE does not employ separate control packets for route discovery and maintenance but, instead, control data is attached to user data packets, thus eliminating the routing overhead introduced by proactive ant packets in AntHocNet.
In SAMPLE, data packets are routed using both the estimated route costs from neighbours (the next hop) to a destination and the estimated connection costs to those neighbours, where an agent’s neighbours are the set of nodes within wireless communication range. Agents acquire estimated costs to destinations from their neighbours using advertisement of estimated route costs. The next hop for a packet is selected based on the sum of the advertised costs of each neighbour and the estimated connection cost of using a network link to that neighbour. Connection costs are calculated using the probability of a packet being successfully sent over the network link, based on a recent sample of packets sent over that link.
Agents advertise their routing costs reactively by piggy-backing estimated route costs to both the packet’s source and destination nodes inside each data packet. Neighbouring nodes receive all data packets sent over the shared wireless communication channel by promiscuously listening to all data packets, and use the advertised route costs to update and improve the quality of their local routing tables and connection cost models (see Figure 2). In order to remove stale routing information from the routing tables, SAMPLE uses route decay, where estimated costs in routing tables for non-advertised routes grow steadily higher over time, hence, gradually eliminating these routes from consideration from routing decisions. This is similar to the idea of pheromone trail evaporation used in ACO , and it means that agents adapt to recent routing traffic, thus preventing convergence on stale routes.
SAMPLE supports implicit exploration of the network for better routes by routing packets probabilistically. SAMPLE also includes a discovery action, implemented as a broadcast packet, to find new nodes and routes in the network. The discovery action is always executed when there is no routing entry available for a destination. However, as Boltzmann action selection  is used to select actions, discovery actions can also be executed, with low probability, during packet forwarding to attempt to discover new nodes, and possibly more optimal routes, e.g., to adapt to network congestion by discovering uncongested routes.
SAMPLE has been specifically designed for a type of MANET scenario where Internet access is provided to mobile nodes in a metropolitan area. In this type of network, a majority of traffic terminates at the few servers in the network that provide the Internet access; there also are a group of stable, fixed nodes with stable routes to the servers. Experimental results of SAMPLE in the NS-2 network simulator  showed that more popular traffic destinations have higher quality routes, since routes to these nodes are more explored and advertised. An emergent property of traffic flowing over the stable routes to the servers can be observed as more traffic flows from mobile nodes to the servers. This emergent property results from many agents using feedback from their local environment and feedback from neighbours to collectively adapt their routing behaviour to favour stable network links. In the presence of stable links, high congestion and a high level of agent dynamism, SAMPLE has been shown to have significantly better network throughput and packet delivery performance than both AODV and Dynamic Source Routing (DSR)  protocols [18, 24].
Freenet  is a self-organising P2P system that allows for publication, replication, and retrieval of data, while protecting the anonymity of both authors and readers. Freenet can be seen as a distributed storage application in which every peer contributes some storage space to the system, and which supports two operations: data insert and data retrieval. Deletion of data is implicit, i.e., items which are not accessed for a period of time are removed automatically. Freenet is a P2P system where the global structure is not predetermined, but emerges in a bottom-up manner. Each peer in the system reactively updates its connections to neighbours using a local routing table adaptation algorithm. A peer’s routing table contains addresses of neighbouring peers and the data keys it believes they hold. Data keys are used to identify data items; these location-independent identifiers allow for the insertion, removal and discovery of data items in the system. In the current Freenet implementation , keys are obtained by applying the 160-bit SHA-1 hash function to a data item. Every peer maintains a local, encrypted data repository where other peers can insert a data item indexed by a key. Peers know only the identifiers of the locally stored data, which is sufficient for answering user requests.
Freenet’s routing mechanism is especially important from the perspective of self-organisation. User requests are routed from one peer to another using a hill-climbing algorithm with backtracking  to decide the location of the next hop. When a peer receives a query, it first checks its own repository, and if it finds matching data, sends it back along the request path. Otherwise, the peer forwards the request to the peer in its routing table with the closest key (determined by lexicographic distance) to the one requested, i.e., hill-climbing search. If a peer sends a query to a recipient that is already on the request path, the message is bounced back and the peer tries to use the next-closest key instead. If a peer runs out of candidates to try, it reports failure back to its predecessor on the path, which then tries its next best choice, and so on, i.e., backtracking. In order to limit resource usage, each request is given a hops-to-live limit that is decremented at each peer and when it reaches zero, the request fails. If a request is ultimately successful, it returns the data back to the upstream requester, where the data is cached locally, and a new entry is added in the requester’s routing table associating the actual data source with the requested key. This way, subsequent requests for the same key will be immediately satisfied from the local cache, whereas requests for “similar” keys will be forwarded to the previously successful data source (see Figure 3).
The presented routing table adaptation mechanism leads to large improvements in routing performance over time. Peers use feedback from successful user requests to adapt their routing tables to specialise in handling clusters of similar keys (determined again by lexicographic distance) . Since each time a peer succeeds in handling a request, it is entered in another peer’s routing table, this increases the probability that it will receive requests for keys that are similar to the key it handled. As the peer gains more experience in handling queries for those keys, it will successfully answer them more often and, in turn, get asked about them more often, in a kind of positive feedback loop. Similarly, peers’ repositories will specialise in storing clusters of data items with similar keys. The creation of new entries in the routing table, after successfully answered requests, is the crucial aspect of the neighbour selection algorithm that leads to the emergent property of clustering of network connections to peers with popular data items. However, one problem with Freenet is that multiple clusters can form in the network containing similar keys, leading to performance problems when the hill-climbing search algorithm causes a request to search clusters (hill tops) that do not contain the data item (as it may be in a different cluster containing similar keys), and ultimately time out. The emergent clusters are localised, not global structures. In general, there is a lack of published material on the performance of Freenet, mainly due to the difficulty in performing experiments due to the strong support for anonymity in Freenet. However, experiments that simulated Freenet’s performance suggest that its performance is acceptable for highly replicated data, but poor for difficult-to-find data .
The review and analysis of AntHocNet, SAMPLE and Freenet show that these systems have many structural features and mechanisms in common. One essential component of these self-organising distributed systems is the autonomous agent that cooperatively solves distributed problems.
In this paper, the reviewed systems are considered multi-agent systems [69, 68, 38, 26], as they meet general characteristics of multiagent systems : agents have incomplete information or capabilities for solving the problem; there is no system global control; data is decentralised; and computation is asynchronous. The agents in the reviewed systems respond in a timely fashion to changes in their local environments, proactively and reactively exchange messages with one another, maintain models of their local environment, and take actions in order to satisfy both individual agent problem solving and system improvement goals. Of the three general classes of interactions in multi-agent systems: cooperation, coordination, and negotiation; the agents in the reviewed systems demonstrate cooperation in routing packets and finding files and coordination in adapting and improving collective routing behaviour in a dynamic environment. There is no negotiation between agents since our agents are not self-interested.
This section describes the environment in MANET and P2P systems and how agents maintain partial views of the system in order to improve performance and adapt to a changing environment.
An agent in a distributed system executes in a local environment consisting of its local host, operating system, network, and local applications or users interacting with the agent. A local environment is defined as everything that is external to the agent and the system (other agents are not external to the system), that has both a direct impact on its operation and can be directly sensed or manipulated by the agent. Thus, from the agent’s point of view, users and external applications are part of the agent’s local environment (see Figure 4). An agent has interfaces to its local environment that determine the nature and scope of what the agent can sense and manipulate in its local environment. Agents may also be mobile, changing their local environment.
Most environments in which self-organising distributed systems operate are inaccessible, non-deterministic, and dynamic. In the particular case of MANET and P2P network environments, the state of network links is inaccessible in practice as network links must be tested to establish whether they are working (an expensive operation); routing actions are non-deterministic, having unpredictable outcomes; and the network state is dynamic, as external factors such as wireless interference and network congestion affect link quality.
Formally, every agent i in the system operates in a local environment, where the local environment’s state at time t is denoted as
where E is the set of all possible local environment states, and i Nt, with Nt defining the set of agents in the system at time t. The total system environment et comprises the union of the local environments (potentially overlapping) of all agents in the system at time t
where denotes the set of all possible system environment states.
The state of the agent’s local environments is total with respect to the agent and independent of the state of the agent. The local agent environment can be modified by actions of the agent itself and actions of other agents operating in it. However, there is an uncertainty in the outcome of agent actions on the state of the environment due to a lack of knowledge of other agent actions as well as non-determinism and dynamism of the environments.
In highly distributed systems, no agent possesses a global view of the entire system and its environment as this is not feasible in any system with high complexity (see Section 2). Similarly, an agent’s view of its local environment can be limited and inaccurate due to inaccessibility, non-determinism and dynamism. Consequently, an agent typically maintains a partial view that encompasses a model of the system and of its local environment. An agent’s partial view is constructed using information received from other agents and information observed from its own local environment.
In order to solve distributed problems, a partial view should contain models for estimating the state of relevant, inaccessible parts of the system or its environment. These models allow agents to take actions based on the state of their local partial view, without the need to first communicate with other agents, which introduces scalability problems and is generally not feasible in real-time environments. However, if agents are to provide good solutions to distributed problems, they must coordinate their behaviour and this can be achieved by the convergence of agents’ partial views. Agents typically adapt and converge their partial views using feedback from their local environment and feedback from other agents. The adaptation and convergence of partial views allows agents to coordinate their action selection and to improve problem solutions. The agent’s partial view includes:
An agent’s internal state consists of all components in the agent’s partial view, i.e., its set of neighbours (neighbourhood), the model of its local environment, partial knowledge from neighbouring agents and localised system properties. The internal state of agent i at time t is denoted as
where I is the set of all possible internal states of the agent i Nt. It should be noted here that while sti includes an agent’s model of its local environment at time t, eti represents the actual state of the environment at time t. An agent’s model of the local environment can deviate from the state of the local environment, and should be kept accurate by frequent feedback from the local environment.
The following subsections describe each element of the agent’s partial view (internal state), and relate the reviewed systems to this model, which is summarised in Table 1.
In decentralised systems, an agent has a neighbourhood, that we define as the set of agents with which it can communicate (i.e., send messages) directly at a particular moment in time. An agent may be aware of other agents outside its neighbourhood, e.g., an agent may know the address of some remote agent and communicate with it through other agents in a multi-hop manner; but if it cannot directly communicate with them, they do not belong to its neighbourhood. The neighbourhood is dynamic with respect to its size and membership, and it must be managed and updated over time. In systems based solely on stigmergy, such as AntNet , where communication between agents is indirect and mediated by the environment, an agent’s neighbourhood is considered to be empty, by our definition. In AntHocNet, however, we consider the routing programs on the hosts to be agents and the reactive and proactive ants to be messages. AntHocNet’s routing programs are autonomous programs that actively maintain a view of their neighbouring nodes.
The agent, in order to interact with other agents and hence participate in the system, needs to initialise its own neighbourhood by discovering other agents in the system and informing them about its existence. This process is often called bootstrapping or discovery [41, 63]. Once the agent discovers a connected agent in the system, it can find other agents and build its own neighbourhood. However, as systems grow in size, agents are not able to maintain and constantly update information about all potential neighbouring agents. Agents often refine their neighbourhood to some limited number of agents by evaluating the relative “fitness” of their neighbours and potential neighbours. The agent’s neighbourhood needs to be also continuously updated, by agents removing neighbours that leave the system. Agents leaving the system might notify neighbours before they leave, but a notification mechanism is not sufficient in the case of arbitrary failures. The only indication to other agents of such a failure is absence of any activity at the failed agent.
In AntHocNet, the agent’s neighbourhood consists of all agents in the agent’s wireless broadcast range. These are the only agents that it can directly communicate with. It can reach other agents only in a multi-hop manner. However, the agent may not know about all its direct neighbours. The agent’s knowledge about neighbours is reflected by its routing table, where each entry corresponds to a direct neighbour. Wireless broadcast and promiscuous listening enable agents to learn about neighbours and bootstrap their routing tables. The routing table entries need to be continuously updated by inserting neighbours that join the system and removing neighbours that fail or leave the system. For failure detection, AntHocNet uses a common heartbeat mechanism , where hello messages are periodically sent to neighbours to determine their availability. Unresponsive neighbours are removed from routing tables.
Similarly in SAMPLE, the agent’s neighbourhood consists of all agents within the agent’s wireless broadcast range, and it is maintained in the agent’s routing table. Again, wireless broadcast and promiscuous listening are used to bootstrap an agent’s routing tables. In contrast to AntHocNet, SAMPLE uses a decay mechanism for agent failure detection, where the routing table entries (corresponding to a neighbour) are degraded over time, in the absence of receiving messages from neighbours. If a routing table entry corresponding to a neighbour reaches some “staleness” threshold, the neighbour is removed from the agent’s routing table.
In Freenet, the agent’s neighbourhood consists of a limited number of agents that have been most recently used for routing. The Freenet protocol does not specify a bootstrap mechanism, but Freenet implementations support centralised seed nodes that can be initially added as neighbours. Once an agent makes an initial connection and issues requests for data, it learns about other agents that successfully answer requests and inserts them in its neighbourhood. A maximum neighbourhood size prevents unbounded neighbourhood growth, set at 250 neighbours in simulation , and when the agent completes a subsequent user request, the Least Recently Used (LRU) entry in the routing table (i.e., a neighbour) is removed to make way for the new entry. This way, assuming enough user requests are satisfied by neighbours over time, inactive neighbours are eventually removed from the neighbourhood.
In other systems, agents sometimes attempt to optimise their set of neighbours using some evaluation function that measures the relative “fitness” of their neighbours and potential neighbours. Distributed Hash Table P2P systems select neighbours based on their unique identifiers [60, 53, 51] and additionally refine them using some proximity metrics such as latency. In super-peer networks, agents adapt their neighbours based on agent resources and capabilities (e.g., high bandwidth, low latency, large storage space or high uptime) . In multi-agent systems, where agents may have different functionality, agents select neighbours based on their capabilities to solving required tasks .
In dynamic network environments, observing the state of local network links is an expensive operation. The reviewed MANET systems overcome this problem by maintaining a model of their network links, that they use to make decisions about the probability of transmission over a link succeeding, see Table 1. Similarly, aspects of the agent’s application and host environments can be modelled to help agents estimate their state. Local environment models are stored in the agent’s partial view.
Local environment models are typically estimators for aspects of the environment that are relevant to the agent for distributed problem solving or for improving behaviour. As agents often take decisions based on the state of the local models, instead of the state of the actual environment, the models need to be continuously updated by gathering information from the local environment. However, local models are not sufficient for agents to take actions that improve system performance. For instance, a routing agent could select a locally optimal action of forwarding a network packet to a neighbour that has the lowest latency, but this locally optimal decision may not be globally optimal if the selected neighbour has only poor quality paths to the destination. Thus, the agent needs to learn about remote regions of the environment through the exchange of partial views with other agents.
In AntHocNet, an agent builds a local model of its network environment by monitoring the size of its local queue of packets that are to be sent at the MAC layer, and by measuring the average time between the arrival of a packet at the MAC layer and the time needed for successful transmission. This information, together with estimated route costs to destinations received from neighbours, is used by the agent to estimate the total cost of routing to known destinations.
In SAMPLE, an agent builds a local model of its network environment by storing a sliding window of the observed number of successful transmissions to failed transmissions to each of its neighbours, at the MAC layer. The sliding window, for each neighbour, is used to estimate the probability of a successful packet transmission over the link to its neighbour. This model is then combined with advertised route costs to destination to estimate the total route cost of delivering a packet to a destination.
Freenet’s representation of the environment contains information about the locally available data items and their keys as well as the reachability of the neighbouring hosts. The information about the locally available data items is propagated to neighbouring agents, enabling them to locate these data items. Recently, in other P2P systems, estimated models of network links have been used to demonstrate improved system performance for DHT-based systems .
Agents that solve distributed problems need to acquire knowledge about remote regions of the system (beside their own local knowledge) from agents that explore these regions. The problems that agents solve are known as tasks, and are defined by some specification . Agents cooperate in task solution and each agent typically has a model of the estimated costs of solving tasks by each neighbour. Agents use local models to make decisions about task delegation to a neighbour, since the cost of communicating with neighbours before deciding on the best neighbour for a task is often prohibitive, especially for increasing numbers of neighbours. The first version of Gnutella is a well known example of how flooding neighbours with queries, without using a knowledge from other agents, can severely impact system scalability .
In self-organising distributed systems, information contained in partial views is shared among agents to promote convergence between agent partial views. There are different methods for sharing knowledge, but the basic design choice is between proactive and reactive mechanisms . Proactive knowledge exchange can be initiated by agents at any time, typically periodically, whereas reactive knowledge exchange is only triggered when some external source (such as the agent’s environment) introduces new information to the system. The choice of proactive and/or reactive mechanisms should take into consideration the expected rate of interactions between agents due to the external environment.
The sharing of knowledge introduces a trust problem into the system, and the model of self-organisation presented in this paper assumes that agents interactions are trusted and cooperative, as is the case in the reviewed systems. Agents are trusted if they behave according to the system’s rules and do not try to disrupt the correct functioning of the system. Agents cooperate in that they contribute their own resources (e.g., bandwidth, storage space) in order to increase the system utility, rather than maximise their individual utility. The issue of support for trust models is considered outside the scope of this paper and has been addressed elsewhere [3, 48, 11, 15].
In AntHocNet, the knowledge that neighbours exchange consists of the estimated costs of routing to selected destinations. AntHocNet uses both reactive and proactive mechanisms for sharing this knowledge, including reactive ants to setup routing paths, and proactive ants to periodically sample routing paths during data traffic.
Similarly, in SAMPLE, neighbours exchange estimated costs of routes to destinations. However, only reactive mechanisms for exchanging routing knowledge are used during both route setup and normal routing operation. Routing cost and node availability information is piggybacked in routing packets, which neighbouring agents promiscuously receive and use to adapt their routing tables.
In Freenet, the knowledge that agents exchange is the agents ability to locate particular keys. A reactive mechanism for knowledge exchange is used; when a data item is successfully located, agents along the request path update their local sets of keys to reflect that they can now locate that item.
Other mechanisms seen in different multi-agent systems for propagating and adapting partial views of agents include proactive gossiping protocols in Newscast  and Astrolabe , reactive event-based notification in Chord  and Pastry , stigmergy mechanisms in AntNet , and knowledge and information exchange languages for MAS [27, 28].
From the combination of local environment models and knowledge received from neighbouring agents, an agent can reason about localised properties of the system, such as a next hop on an optimal routing path to a desired destination. These estimated properties enable coordinated agent actions that are globally near-optimal and result in the system behaving as a coherent whole.
In AntHocNet and SAMPLE, routing costs are aggregated over the set of agents in a routing path, and thus make up a property of the system that is localised to a group of agents. Agents use the search algorithms ACO and CRL, respectively, to make routing decisions to destinations in remote parts of the system. These search algorithms can generate good global decisions by basing their decisions on both local environment models and the estimated cost of routing via a neighbour to a known destination.
In Freenet, when an agent needs to locate a data item that is not available locally, it selects a neighbour that specialises in locating the closest key. As agents that provide similar keys tend to cluster, the local knowledge of key clustering helps solve the global problem of finding good paths to a destination.
One of the main goals when engineering self-organising systems is the improvement of desired system properties, given the state of the system and its environment . Many different properties of distributed systems can be recast as problems that can be quantified and subsequently improved or optimised. Examples in the reviewed systems include maximising routing performance, minimising packet loss, and maximising the clustering of similar data items. Related concepts for evaluating the performance of multi-agent and distributed systems include Wooldridge’s system utility [67, 68], Wolpert’s world utility that is used to rate the collective behaviour of agents in his Collective Intelligence (COIN) model , and Babaoglu’s figure of merit (FOM) that is used to rate the sensitivity of the system to a dynamic environment .
In this section, we introduce a formal model of self-organisation in distributed systems as the adaptation of the system to improve desired system properties in the current system environment. No implementation or formal validation of the model is provided, but we show how the model is realised in the reviewed systems. Our approach is based on models proposed for multi-agent systems [68, 67, 30]. Our model should be useful in aiding system designers understand how techniques such as feedback models and evaluation functions can be applied to build self-organising distributed systems with desirable system properties.
The set of all agents’ internal states comprises the system state. The system state at time t is denoted as st,
where Nt is the set of agents in the system at time t, sti I are the internal states of each agent, as defined in Formula 1, and is a set of all possible system states.
Similar to Wooldridge’s multi-agent model , we can model system behaviour in a given environment as a sequence of pairs of the system state and the total environment state (i.e., sequence over ×),
Each system state transition is caused by transitions of individual agent internal states, which are, in turn, caused by agent action execution, receiving feedback from neighbouring agents and the local environment, and decay mechanisms. We introduce all these concepts in the rest of Section 5. The transitions of the environment state are caused by agents modifying their local environments and dynamism of the environment, see Section 4.1.
Desired system properties in a multi-agent system can be optimised if they can be recast as approximate optimisation problems that can be subsequently maximised or minimised. Formally, we introduce an abstract system utility function that measures the quality of some desired system property for a particular state of a system and the system’s environment
We can also say that the utility function measures how well the system state is matched to the environment (where higher values are preferable). Self-organising systems should be engineered to adapt towards states that increase the system utility. We define the set of optimal system states for a given environment e as
Hence, at time t, the set of optimal system states is t⋆ = ⋆(et). Given system behaviour as a transition of system and environment states, we say that the system state converges toward the optimal states if
In order for agents to adapt to an optimal state, they require a relatively stable environment; an unrealistic assumption in MANET and P2P networks. For systems in dynamic environments, their goal is to reach good enough, near-optimal states in reasonable time. Systems have to trade off their speed of adaptivity for their ability to find near-optimal states.
The presented utility function gives some high-level intuition in how a self-organising system behaves, however, it does not provide any insights on how to realise this behaviour. This is because agents have only limited view on the system and do not have access to the system utility function. In the reviewed systems, the utility function could measure the quality of routing tables, i.e., how precisely routing tables reflect the current environmental conditions. In optimal system states, St⋆, the routing tables should enable all agents to select routing paths for packets (or requests) such that they optimise some routing metric, such as maximising network throughput or packet delivery ratio.
The challenge in building systems that improve system utility in a dynamic environment is to coordinate agent actions that are based only on the agents’ local states and to adapt agent states to a changing environment. This is difficult for the following reasons:
The first three uncertainties result directly from the agent’s lack of global system knowledge. The last two uncertainties are caused by inaccessibility, dynamism and non-determinism of the environment.
In the following sections we show a model for how agents in self-organising systems select their local actions based on local state, and how they use feedback and decay to adapt their local state to their environment and neighbours. We then discuss, using our model, how systems can increase their utility in a dynamic environment.
Some form of consensus on the state of the system and the environment, between a subset or all of the system’s agents, is required in order to coordinate agent behaviour, and, consequently, improve desired system properties . Agents proactively or reactively exchange selected parts of their state, promoting convergence between neighbouring agents on their view of the system, thus enabling a form of consensus to emerge. Agents share the minimal, but complete, state necessary to cooperate and solve their distributed problems, e.g., routing agents only share best estimated routing distances to destinations rather than all estimated distances using all of their neighbours.
The model of consensus we are interested in is not system-wide, but localised to groups of agents in the system, and there is no management of group membership. Agents can adapt their state to become closer to one another using feedback mechanisms, introduced later in Section 5.5. We start defining consensus between a set of agents by first introducing an abstract distance metric that measures the difference between internal states of two agents:
A system is said to have reached strict global consensus at time t when the distance between any two agent states in the system is zero:
where Nt is the set of all agents in the system at time t. However, this form of consensus is not possible as it would require that all agents possess synchronised global knowledge of the system. Thus, agents only strive to achieve localised forms of consensus between smaller groups of agents N′t ⊂ Nt. This consensus is also weak, as agent states are not necessarily strictly consistent; their distance is limited by some constant, ɛ:
The consensus between localised groups of agents in self-organising systems is often eventual (see eventual consistency models in Section 2), meaning that in a stable environment agents’ internal states converge over time:
In the reviewed systems, a form of localised weak consensus emerges between the agents. However, due to the dynamism of the environment and non-linear adaptations produced by inter-agent feedback, see Section 5.5.1, it is mathematically intractable to derive all the values of i,j N′t and ɛ using an analytical approach. Nevertheless, given a reasonably accurate model of the system’s environment, these values can be observed empirically through experimentation. It can be shown that statistically, within a given confidence interval, that agents maintain localised weak consensus on some state of the system.
In AntHocNet, proactive ants improve consensus between agents that lie on their paths between the source and destination. Since ants are forwarded with higher probability to agents situated on the high quality paths, more effort is dedicated to improve consensus on high quality paths, which are also more likely to be exploited for routing. Furthermore, routing information is gradually diffused between agents by periodic broadcasts. Through these mechanisms, localised groups of routing agents, delimited by their physical proximity, can establish weak consensus on the best quality routes to popular destinations in the network, given a stable network environment.
In SAMPLE, localised weak consensus between agents is achieved through advertisements and promiscuous listening. When a data packet is sent by an agent, all neighbouring agents (i.e. agents within the wireless communication range) receive information about the cost of the path from that agent to the packet’s original source and destination (see Figure 5). Furthermore, routing costs are advertised through broadcasts. Advertisements can propagate over multiple hops in the system when neighbours adapt to advertisements and re-advertise their new paths. Since popular destinations are advertised more often, better consensus emerges on the cost of paths leading to them, and consequently lower cost paths are discovered to popular destinations.
In Freenet, the key clustering mechanisms group together agents specialising in locating and storing data items associated with similar keys. Agents located within the same clusters have similar routing tables and store similar keys, and hence, are close to each other in terms of the dist metric and reach a localised weak consensus. The consensus can be seen to emerge between agent routing tables on the best agents to use for different keys. For each successful request, agents on the request path improve their consensus on how to locate the requested key. These agents also improve consensus on the location of other data items that have similar keys since agents specialise in routing to clusters of similar keys.
Agents need some model for selecting actions that both solves agent-level tasks and strives to improve system utility in a given environment. Individual agent behaviour can be abstractly defined by three functions that describe how agents evaluate their available actions in the current state, select an action, and execute the selected action. First, we introduce an eval function that is used by agents to estimate the utility of the possible actions, given the current agent state
where Ac is the set of all actions available at the agent. In systems where agents share a common goal, as in the systems reviewed in this paper, the evaluation function should be designed in such a way that action utility calculated by eval corresponds to the action’s estimated effect on system utility. This is a challenging problem that we return to in Section 5.6.
Subsequently, an action is selected based on the output of the evaluation function. Since agents only have a partial view of the system, estimations of action utility may be inaccurate, as the local state may not accurately model the current system and environment state. Thus, an agent needs to trade-off the exploitation of the knowledge of its current state (executing locally higher utility actions) with exploration for new states (executing lower utility actions). Formally, we define an action α Ac executed by an agent i at time t as exploratory if:
Otherwise, the action selected exploits knowledge in the current local state if
This leads us to define a probabilistic model for action selection that allows for the selection of both exploitative and exploratory actions as
where Ac is the set of all probability distributions over the set of actions. Thus, the select function returns a discrete probability distribution PAc Ac, such that ∑ αAcPAc(α | sti) = 1. Typically, such a probability distribution is designed so there is a higher probability that actions with higher estimated utility are selected, over actions with lower estimated utility. An action selected from the probability distribution is executed by the agent, where an execute function is defined as
The execute function captures how an action may involve sensing the agent’s local environment, and how it results in a new agent state.
In the reviewed systems, AntHocNet agents evaluate actions, including data routing and sending a reactive or proactive ant, based on the availability of routes and estimated route costs for neighbouring agents. If routes are available to a destination, routing actions for data traffic are selected using a probabilistic model that is tuned in experiments to favours better routes. If no route is available, an action to send a reactive forward ant is selected that uses a probabilistic model to discover a route that is tuned to favour exploration. Actions to send proactive ants for route maintenance are triggered after n data routing actions.
SAMPLE uses CRL to evaluate agent actions based on the local environment model, the availability of routes and the estimated route costs for neighbouring agents. If routes are available to a destination, SAMPLE selects routing actions using a probabilistic policy called Boltzmann action selection. If no route is available, a broadcast action is selected that floods the network to discover a route.
Freenet evaluates actions based on the distance of the required key to each neighbours’ advertised keys. In Freenet actions are limited to unicast routing. Freenet uses a deterministic, hill climbing search algorithm when handling user requests for keys. Initially agents have no information about keys associated with the neighbouring agents, thus routing involves much exploration that gradually reduces as agents specialise in locating clusters of similar keys.
Agent adaptation involves the updating of agents’ internal states, and consequently their behaviour, as a result of action execution (Formula 2), receiving feedback from neighbouring agents and the local environment, and decay mechanisms. Agents encode information from local and remote parts of the system locally in their state or in the system topology by reconfiguring their neighbourhoods. Adaptation can help reduce the uncertainties that prevent agents from selecting actions that improve system utility. We describe here how agents adapt their behaviour using two mechanisms: feedback and decay.
Feedback is a form of communication of information between agents in a self-organising system about the state of a part of the system or its environment. It is a mechanism that enables groups of agents to reach some form of consensus on their state and on their environment.
Two possible sources of feedback for an agent are its local environment and its neighbours. Agents use feedback from their local environments and neighbours to update their local state. Formally, we can define feedback from an agent j to an agent i produced at time t as a feedback message, f , that contains a part of agent j’s state
Feedback messages are generated by agent actions
where is the set of all possible feedback messages, I is the set of all possible agent states, E is the set of all possible local agent environment states and Ac is the set of all possible agent actions. A feedback message for an agent can also be produced by the agent’s local environment. It is then represented as a subset of the state of the local environment in which agent i operates, f ⊆ eti E. Agents that receive a feedback message from the local environment or neighbouring agents, use it to adapt their local state that we formally represent by an adapt function:
Feedback is the main mechanism used for the adaptation of agent behaviour; by adapting an agent’s state, feedback can modify an agent’s action selection policy. Environmental feedback enables an agent to learn about the state of its local environment, and agents that share the same local environment (such as agents deployed on the same host) can share the same local environment model. Inter-agent feedback (feedback between agents) is generated by the proactive or reactive sharing of selected agent state and enables the convergence of agents’ state. Agents should not share their entire partial view, but only information relevant to neighbours, e.g., information common to problems they are solving.
Two variants of feedback are possible in dynamical systems: positive and negative feedback that, respectively, amplify or decrease the selection of actions by agents. Formally, we say that positive feedback f amplifies the use of action α Ac by agent i if the updated agent state, sti, improves the utility of that action:
Similarly, negative feedback can be defined for an action as one that decreases the probability of an action being selected. Positive feedback loops can form at the system-level, where some agent adapts its local state and produces feedback to neighbouring agents, which in turn adapt themselves, affecting their neighbours, often resulting in system-wide adaptations.
Inter-agent feedback, whether positive or negative, improves consensus between agents. When agent j sends a feedback message, f, to agent i and agent i adapts its local state, the distance between the internal states of agents i and j decreases
Similarly, when agent i adapts its state si to the feedback received from the environment, si becomes more consistent with ei (which can be formalised by introducing appropriate metrics).
In the reviewed MANET systems, inter-agent feedback consists of estimated route costs for destinations. Feedback is considered positive or negative, according to whether it increases or decreases the cost of a particular routing path in the agent’s routing table. Positive feedback processes cause more data packets to be attracted to a path, resulting in more feedback being generated for that path. Negative feedback, on the other hand, decreases the attractiveness of the routing path and consequently reduces the amount of feedback generated for the path (see Figure 6(a)). Only local adaptations that change an agent’s best estimated route cost to a destination are further propagated to neighbours. As such, cascading updates in a system are often the result of positive feedback processes where new better paths are discovered. The presence of positive and negative feedback loops leads to non-linear system behaviour, where small causes can have large effects, making the systems less amenable to formal analysis. Non-linear interactions between agents can be observed in SAMPLE and AntHocNet when an agent on a stable path or trail to a popular destination unexpectedly fails. This relatively small event (from a system perspective) can trigger a feedback process whereby the routing policies of a large number of agents are adapted until agents converge on a new route to the popular destination.
In AntHocNet, inter-agent feedback about estimated route costs is carried by reactive and proactive ants. The local network environment provides feedback to agents about the size of local packet queues. The implementation of the ACO algorithm uses both forms of feedback to adapt routing table entries.
In SAMPLE, inter-agent feedback is provided by unicast and multicast packets that carry route cost advertisements and that agents receive by promiscuously listening. The local network environment provides feedback about the success or failure of packet transmissions. The CRL algorithm uses inter-agent and environmental feedback to adapt the routing table entries and the local network link models, respectively.
In Freenet, feedback is supplied to agents when they handle a user request. Agents receive feedback on whether the request has been successfully handled. If it has, the agent associates the requested key with the agent that fulfilled the request in its routing table. This adaptation of agent routing tables causes more requests for similar keys to be sent to this agent, triggering a positive feedback loop (see Figure 6(b)). The positive feedback loop causes agents to specialise in both routing for clusters of similar keys and, as agents cache each reply locally, storing clusters of data with similar keys. However, Freenet’s performance suffers due to excessive positive feedback, where agents over-specialise in clusters of keys. Excessive clustering adversely affects the efficiency of Freenet’s request routing performance for requests that are not in the agent’s local neighbourhood . Zhang improved Freenet’s neighbour reconfiguration algorithm by allowing for probabilistic selection of long-range random links. Random links act as negative feedback on excessive clustering. Zhang showed how this negative feedback can help stabilise an agent’s routing table and improve system performance for request searching.
One of the limitations of the feedback model is that when agents leave the system, their state may be still included in neighbouring agents’ partial views. Failed agents cannot produce feedback messages to inform their neighbours of their departure from the system. A similar problem can be found when an agent stops receiving feedback from the local environment; this may lead to divergence between the state of the agent’s local environment model and its real environment. To handle these cases, agents can decay their partial view over time. Decay models are useful in ensuring that agents’ partial views do not diverge too much from the actual state of the system’s environment, as they require a constant flow of feedback from the environment and other agents to maintain stable agent states. Decay can be described by a function that each agent applies to its partial view:
In AntHocNet, pheromone values in route tables decay over time in a process known as evaporation. In the absence of routing traffic, route costs to neighbours are gradually increased. However, routes are only removed when a neighbour’s availability changes, and this is monitored by proactive hello messages.
In SAMPLE, decay is a form of time-driven, single-agent negative feedback, i.e., it does not cascade to other agents or produce feedback loops. Agents decay the estimated route costs in the routing tables at every discrete time unit, i.e., they increase the cost of routing actions. In the absence of routing packets from the application environment, the stable paths that are formed through positive feedback processes gradually degrade in quality until the routing actions for those stable paths are removed from routing table entries.
In contrast, Freenet’s decay model is not based on updating agent state at discrete time steps. Routing tables are decayed when user requests are fulfilled; the decay model causes the LRU entry in its routing table to be removed. However, as decay is not based on elapsed time, the absence of user requests can lead to stale routing tables that can contain entries for agents that have left the system.
The presented model describes the self-organising behaviour of MANET and P2P systems, as improving system utility through agents using local information to adapt to a dynamic network environment. A summary of the model is presented in Figure 7. It explains how agent-level mechanisms such as inter-agent feedback, environmental feedback, decay and agent adaptation enable agents to coordinate their behaviour to improve system utility. Agents in our model use only information from their local state, without reference to any global knowledge, to improve system utility.
We believe that our model is also applicable to understanding and building self-organising multi-agent systems, in general. It allows us to ask questions such as: given a group of agents, an environment representation, and a problem to solve, how do we construct:
The main challenges in building self-organising distributed systems using our model are the design of: environmental feedback models that match an agent’s internal models to real environment; inter-agent feedback models that promote localised consensus between agents; and action evaluation functions that improve system utility in a stable environment. For local environment feedback, information gathered from the agent’s local environment must somehow be transformed into an efficient representation, stored in the agent’s partial view. For inter-agent feedback models, distributed problems must somehow be represented in a form where less than strict consensus between partial views is acceptable for applications.
The relationship between action evaluation functions and system utility is a particularly challenging problem. In cooperative systems, where agents share a common goal (as in the reviewed systems), the evaluation function can be designed as a utility function. However, if there is no feedback between agents or feedback between agents and the environment, agents have high uncertainty as to the utility of available actions due to their lack of global knowledge or the state of the environment. Probabilistic action selection is one approach that enables agents to explore their environment to find better solutions to problems (helping improve the system utility). However, models where agents only use exploratory actions to improve their estimation of the action utility often decrease system utility, due to exploratory actions increasing usage of the system’s resources. Feedback models, in contrast, are more efficient at allowing agents learn about the state of the system and the environment, as a feedback message can be generated by a single agent executing an action, but received by many agents. Feedback enables the collective, asynchronous improvement in the quality of partial views at many agents, helping improve agents’ ability to estimate the utility of actions. The convergence of agent state through feedback can help agents to coordinate their behaviour to improve system utility, provided that local environmental models accurately model the real environment.
The use of feedback to improve system utility can be observed in congestion scenarios in the MANET systems. When the level of traffic over a path reaches a state of congestion (decreased system utility), agents with a new routing problem avoid using the congested path through feedback from the local environment model and feedback from remote agents. The agent does not need to route packets on the congested path in order to discover its state of congestion, which would decrease system utility. Instead, feedback enables a localised group of agents to reach consensus that the path is congested, and agents then take actions using local state to avoid that path. In this way, evaluation functions that use state models updated by environmental and inter-agent feedback, can be based on approximate optimisation algorithms that maximise the agent’s local utility, while simultaneously improving system utility.
In the reviewed MANET systems, designers favoured experimentation and simulation to evaluate system utility. This is due to the difficulty in formally validating convergence properties of multi-agent systems in dynamic environments with non-linear system behaviour [65, 4]. Experiments require a realistic model of the system’s real environment, provided by the network simulators used by SAMPLE and AntHocNet, where system utility values such as network throughput and packet delivery ratios can be measured over different network setups and experimental runs.
In this paper, we presented an abstract agent-based model of self-organising MANET and P2P systems and showed how it is realised in the reviewed systems. The model describes how agent behaviour, based on local state models of the environment and neighbours, can be adapted to improve overall system behaviour, through feedback generated from a dynamic environment. Agents continuously use feedback from neighbouring agents and their local environments to improve the quality of their partial view of the system, allowing localised groups of agents’ partial views to converge, thereby enabling coordinated agent behaviour. Coordinated agent behaviour, in turn, can help improve desired system properties. In general, feedback models can reduce the need for agents to take actions to explore the system’s environment and can reduce the amount of message passing required to coordinate agent behaviour; both helping to increase system utility. The agent-level mechanisms that our model covers are action evaluation functions, action selection policies, feedback, and decay.
We believe that our model can be used to inform the construction of distributed systems with improved performance in dynamic environments. For example, in existing state of the art DHT-based P2P networks, system structure is determined by node identifiers, rather than the state of the environment. We have been investigating a Gradient topology where system structure captures information about node uptime and performance characteristics . We are also using our model to help inform other research in building P2P multicast streaming protocols that adapt to heterogeneity in their network environment , and self-organising traffic lights that optimise vehicular traffic .
We would like to thank Giovanna Di Marzo Serugendo and the anonymous reviewers for their valuable comments and suggestions to improve the quality of this paper.
 Ozalp Babaoglu, Geoffrey Canright, Andreas Deutsch, Gianni Di Caro, Frederick Ducatelle, Luca Gambardella, Niloy Ganguly, Márk Jelasity, Roberto Montemanni, Alberto Montresor, and Tore Urnes. Design patterns from biology for distributed computing. ACM Transactions on Autonomous and Adaptive Systems, 1(1):22–66, 2006.
 J. Baras and H. Mehta. A probabilistic emergent routing algorithm for mobile ad hoc networks. In WiOpt 2003: Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks, Sophia-Antipolis, France, 2003. IEEE Computer Society Press.
 K. Birman and T. Joseph. Exploiting virtual synchrony in distributed systems. In SOSP ’87: Proceedings of the eleventh ACM Symposium on Operating systems principles, pages 123–138. ACM Press, 1987.
 Bartosz Biskupski, Raymond Cunningham, Jim Dowling, and Rene Meier. High-Bandwidth Mesh-based Overlay Multicast in Heterogeneous Environments. In Proceedings of the International Workshop on Advanced Architectures and Algorithms for Internet Delivery and Applications, Pisa, Italy, October 2006. ACM Press. to appear.
 V. Cahill, E. Gray, J.-M. Seigneur, C. Jensen, Y. Chen, B. Shand, N. Dimmock, A. Twigg, J. Bacon, C. English, W. Wagealla, S. Terzis, P. Nixon, G. Serugendo, C. Bryce, M. Carbone, K. Krukow, and M. Nielsen. Using Trust for Secure Collaboration in Uncertain Environments. IEEE Pervasive Computing Magazine, 2(3):52–61, 2003.
 Scott Camazine, Nigel R. Franks, James Sneyd, Eric Bonabeau, Jean-Louis Deneubourg, and Guy Theraula. Self-Organization in Biological Systems. Princeton University Press, Princeton, NJ, USA, 2001.
 Ian Clarke, Oskar Sandberg, Brandon Wiley, and Theodore W. Hong. Freenet: A distributed anonymous information storage and retrieval system. In Proceedings of the International Workshop on Designing Privacy Enhancing Technologies, pages 46–66. Springer-Verlag, 2000.
 Raymond Cunningham, Jim Dowling, Anthony Harrington, Vinny Reynolds, René Meier, and Vinny Cahill. Self-optimization in a next-generation urban traffic control environment. ERCIM News - Special: Emergent Computing, 64:55–56, January 2006.
 Eoin Curran and Jim Dowling. SAMPLE: Statistical network link modelling in an on-demand probabilistic routing protocol for ad hoc networks. In Proceedings of the 2nd Conference on Wireless On demand Network Systems and Services, pages 200–205. IEEE Computer Society, January 2005.
 K. Decker, K. Sycara, and M. Williamson. Middle-Agents for the Internet. In Proceedings of the 15th International Joint Conference on Artificial Intelligence, pages 578–583, Nagoya, Japan, 1997. IJCAI.
 G. Di Caro, F. Ducatelle, and L.M Gambardella. AntHocNet: An adaptive nature-inspired algorithm for routing in mobile ad hoc networks. European Transactions on Telecommunications, Special Issue on Self-Organization in Mobile Networking, 16:443–455, 2005.
 J. Dowling, E. Curran, R. Cunningham, and V. Cahill. Using feedback in collaborative reinforcement learning to adapt and optimise decentralised distributed systems. IEEE Transactions on Systems, Man and Cybernetics (Part A), Special Issue on Engineering Self-Organized Distributed Systems, 35(3):360–372, 2005.
 T. Finin, R. Fritzson, D. McKay, and R. McEntire. KQML as an Agent Communication Language. In N. Adam, B. Bhargava, and Y. Yesha, editors, Proceedings of the 3rd International Conference on Information and Knowledge Management (CIKM’94), pages 456–463, Gaithersburg, MD, USA, 1994. ACM Press.
 Sanny Gustavsson and Sten F. Andler. Self-stabilization and eventual consistency in replicated real-time databases. In WOSS ’02: Proceedings of the first workshop on Self-healing systems, pages 105–107. ACM Press, 2002.
 Michael N. Huhns, Vance T. Holderfield, and Rosa Laura Zavala Gutierrez. Achieving software robustness via large-scale multiagent systems. In SELMAS, pages 199–215, Orlando, Florida, 2002. Springer.
 Márk Jelasity and Özalp Babaoglu. T-man: Gossip-based overlay topology management. In Engineering Self-Organising Systems, volume 3910 of Lecture Notes in Computer Science, pages 1–15. Springer, 2006.
 Gurmeet Singh Manku, Mayank Bawa, and Prabhakar Raghavan. Symphony: Distributed hashing in a small world. In Proceedings of the 4th USENIX Symposium on Internet Technologies and Systems, pages 127–140. USITS, March 2003.
 Alberto Montresor. A robust protocol for building superpeer overlay topologies. In Proceedings of the 4th International Conference on Peer-to-Peer Computing, pages 202–209. IEEE Computer Society, August 2004.
 Venkata N. Padmanabhan and Kunwadee Sripanidkulchai. The case for cooperative networking. In IPTPS ’01: Revised Papers from the First International Workshop on Peer-to-Peer Systems, pages 178–190, London, UK, 2002. Springer-Verlag.
 H. V. D. Parunak, S. A. Brueckner, J. A. Sauter, and R. Matthews. Global convergence of local agent behaviors. In Proceedings of the 4th International Joint Conference on Autonomous Agents and Multi-Agent Systems, volume 1, pages 305–321. ACM, 2005.
 Charles E. Perkins and Elizabeth M. Royer. Ad-hoc On-demand Distance Vector routing. In Proceedings of the 2nd Workshop on Mobile Computer Systems and Applications, pages 90–100. IEEE Computer Society, 1999.
 Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, and Scott Schenker. A scalable content-addressable network. In Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, pages 161–172. ACM Press, 2001.
 Robbert Van Renesse, Kenneth P. Birman, and Werner Vogels. Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining. ACM Transactions on Computer Systems, 21(2):164–206, May 2003.
 Antony I. T. Rowstron and Peter Druschel. Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg, pages 329–350. Springer-Verlag, 2001.
 Jan Sacha, Jim Dowling, Raymond Cunningham, and René Meier. Discovery of stable peers in a self-organising peer-to-peer gradient topology. In Proceedings of the 6th IFIP International Conference on Distributed Applications and Interoperable Systems, number 4025 in LNCS, pages 70–83. Springer-Verlag, June 2006.
 G. Di Marzo Serugendo, N. Foukia, S. Hassas, A. Karageorgos, S. Kouadri Mostéfaoui, O. F. Rana, M. Ulieru, P. Valckenaers, and C. Van Aart. Self-organising applications: Paradigms and applications. In Proceedings of the Engineering Self-Organising Applications Workshop (ESOA’03). Springer-Verlag, 2004.
 Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. ACM SIGCOMM Comput. Commun. Rev., 31(4):149–160, 2001.
 T. De Wolf, G. Samaey, , and T. Holvoet. Engineering self-organising emergent systems with simulation-based scientific analysis. In Proceedings of the 4th International Workshop on Engineering Self-Organising Applications, pages 138–152, Hakodate, Japan, 2005. LNCS 3910, Springer.
 Beverly Yang and Hector Garcia-Molina. Designing a super-peer network. In Proceedings of the 19th International Conference on Data Engineering, pages 49–60, Bangalore, India, March 2003. IEEE Computer Society.
 Ben Y. Zhao, Yitao Duan, Ling Huang, Anthony D. Joseph, and John D. Kubiatowicz. Brocade: Landmark routing on overlay networks. In Proceedings of the 1st International Workshop on Peer-to-Peer Systems, pages 34–44, Cambridge, MA, USA, March 2002. Springer-Verlag Heidelberg.