# Course:CPSC522/Adaptive Network Routing using ACO

## Adaptive Network Routing using ACO

In this page we discuss the application and analysis of Ant Colony Optimization (ACO) in building an efficient network protocol.

Principal Author: Kumseok Jung
Collaborators:

## Abstract

Ant Colony Optimization (ACO) is a metaheuristic for finding solutions to combinatorial optimization problems. Inspired by the behaviour of ants, it uses deliberative mobile agents to assess different parts of the problem space, and applies the concept of stigmergy - an indirect and asynchronous way of exchanging information through the environment - to propagate feedback signals for iteratively correcting the current solution until they converge to the optimal solution. The problem of finding the optimal path in a communication network is a multi-objective optimization problem in a dynamic environment, and is not trivial. AntNet applies ACO to implement a routing protocol in datagram networks with irregular topology, and experiments indicate that it outperforms existing state-of-the-art protocols such as OSPF (Open Shortest Path First) currently used in the Internet and is comparable to Dijkstra's shortest path algorithm, which is the theoretical optimal solution.

### Builds on

• Routing Protocol specifies how routers communicate with each other, distributing information that enables them to select routes between any two nodes on a computer network.
• Dijkstra's Shortest Path Algorithm is an algorithm for finding the shortest paths between nodes in a graph.
• Metaheuristic is a higher-level procedure or heuristic designed to find, generate, or select a heuristic (partial search algorithm) that may provide a sufficiently good solution to an optimization problem, especially with incomplete or imperfect information or limited computation capacity.[3]
• Swarm Intelligence is a form of collective intelligence that emerges as a result of a group of natural or artificial agents interacting with each other and the environment.
• Ant Colony Optimization is a population-based metaheuristic that can be used to find approximate solutions to difficult optimization problems.[4]
• Combinatorial Optimization is a topic that consists of finding an optimal object from a finite set of objects.[5]
• Multi-agent Systems is an area of artificial intelligence concerned with the study of multiple autonomous agents acting in an environment.
• Monte Carlo methods are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results.
• Temporal Difference Learning is an approach to learning how to predict a quantity that depends on future values of a given signal.
• Reinforcement Learning is learning by interacting with an environment.

## Content

### Motivation

In a computer network, not all devices have direct physical links between each other and therefore communication between any two devices are established by a series of routers redirecting data packets from the source node to the destination node. Routing involves each device maintaining a table of neighbouring nodes and the relevant statistics, so that an incoming packet can be strategically forwarded to an outgoing link selected from the table. A routing protocol describes the structure of the routing table, the rules for updating the table, and a policy for selecting a node for a given packet. The goal of a good routing protocol is to maximize the overall network performance and to minimize the cost involved; in other words, it must maximize the throughput and minimize the latency.

"the effect of good routing is to increase throughput for the same value of average delay per packet under high offered load conditions and to decrease average delay per packet under low and moderate offered load conditions." -Bertsekas & Gallager, 1992

The problem of routing is difficult because it is intrinsically distributed. A change in the system state, say a node failure, cannot be globally observed immediately, and thus it is impossible to have an up-to-date and complete model of the distributed (global) state. Additionally, although there is certainly a global optimal policy, there is no global decision system. Each node in the network has to make routing decisions based solely on its local policy, using delayed and incomplete information about the global state. It can only be hoped that the policies employed locally by individual nodes add up to the global optimal policy. Another aspect that makes the problem difficult is that the problem itself is stochastic and non-stationary. The theoretical optimal policy varies with time and depends on the traffic status; even if a complete model was available, techniques like dynamic programming cannot be used. Furthermore, the routing problem involves optimizing multiple conflicting and intertwined aspects of the system - an example being the previously mentioned throughput and latency - while considering multiple constraints imposed by the underlying network infrastructure and the demands of the service users. Besides the algorithmic complexity, at the implementation level the solution needs to be fault-tolerant and reliable, being robust against random node failures.

### Approach

#### Taxonomy of Routing Algorithms

At a higher level, routing algorithms can be characterized by the following properties:

• Centralized vs Distributed

In most cases, centralized algorithms are not fault-tolerant and are impractical due to the delays in gathering information about the network and communicating the routing decisions across the network. Static algorithms compute routes based only on the source and destination, but does not take into account the current network state, such as number of packets queued in the outgoing links. For real-world applications, it makes more practical sense to use distributed adaptive routing algorithms so that it can adapt to varying spatial and temporal traffic conditions. For the rest of the discussion in this page, we will focus on adaptive and distributed solution.

One of the issues that arise from using an adaptive algorithm is the problem of oscillating paths, where a constantly changing forwarding policy causes a packet to travel in a cycle.

From an optimization perspective, the routing paradigms can be divided into the following:

• Minimal routing vs Non-minimal routing
• Optimal routing vs Shortest path routing

Minimal routing algorithms only select paths with minimal cost, while non-minimal routers choose from all the available paths using a heuristic. Optimal routing is an approach focused on optimizing a certain function over all the individual links of the network and requires knowledge of the traffic patterns across the whole system. For instance, minimizing the sum of all the link costs is an optimal routing approach. In contrast, shortest path routing is concerned with minimizing the communication overhead between two nodes based on the link states, and without considering a global cost function and without a priori knowledge of the traffic status. This makes shortest path routing algorithms more flexible to system changes and therfore is the most widely used approach, for example in the Internet.

ARPANET, the ancestor of Internet, first used a shortest path routing algorithm called Bellman-Ford in 1969, which belongs to the class of distance-vector algorithms based on the principles of dynamic programming. In Bellman-Ford, each router keeps a routing table of a 3-tuple of the form (Destination, Estimated Distance, Next Hop) defined for all the nodes in the network. Asynchronously and iteratively, the nodes exchange with its neighbours its current estimated distance to other nodes in the network. Each node updates the entries in its routing table corresponding to the node that sent the new values. The process converges in finite time, but it introduces circular paths and therefore was replaced by Shortest Path First (SPF) algorithm in 1980, which is classified as a link-state routing algorithm. The current version in use, Open Shortest Path First (OSPF) improves upon its predecessor and was introduced in 1994. Link-state algorithms keep routing tables much larger than those in distance-vector algorithms, each table essentially representing a dynamic map of the complete network, and involves flooding the network with local state of each node to keep the information synchronized. OSPF requires static management of the topology and delegates congestion control to an external system; it is merely a "best compromise" among efficiency, stability, and robustness[7]. This brief summary of the history of routing algorithms emphasizes the difficulty of implementing a flexible and robust system, and it is worth to consider other perspectives at tackling this problem.

#### AntNet

Di Caro and Dorigo introduced a novel routing protocol named AntNet[1], based on the principles of Ant Colony Optimization (ACO). It attempts to solve the routing problem by taking a multi-agent systems approach, in which multiple mobile agents concurrently and in parallel perform an on-line Monte Carlo (MC) simulation. However, there are some key differences that distinguish the AntNet algorithm from classical Monte Carlo:

• The simulation is "on-line", where the model is updated during the execution. This makes it also similar to Temporal Difference learning.
• The updated model is shared asynchronously with other random MC instances (ants) operating in the vicinity, but not with all the instances.
• A single MC instance solves only its local optimal solution and cannot solve the global optimal solution. The global optimal of the network emerges as a result of individual MC instances concurrently and stochastically discovering partial solutions.

Some parallels with Reinforcement Learning (RL) can also be drawn, since the routing problem can be interpreted as a stochastic time-varying RL problem[1]. In AntNet, each router receives reinforcement signals from the incoming ants to iteratively update its internal model of the world and improves its routing policy towards a local optimum. However, unlike classical RL problems, the routing problem is non-stationary and non-Markovian, making usual RL approaches unsuitable.

The following is a higher level description of the algorithm:

• Each node in the network concurrently sends out mobile agents towards a randomly selected destination.
• A single mobile agent, called an ant, has one behaviour when going in the forward direction, and has a different behaviour going in the backward direction. When going forward, it is called a forward ant, and it travels in the same traffic as the data packets. In contrast, backward ants travel in a privileged traffic.
• The mobile agents act independently and communicate with each other indirectly by reading and writing information in the network nodes.
• Each agent seeks to travel along the shortest path between the source and the destination node.
• During the trip going forward, each ant collects information about time taken to reach an intermediate node and the congestion status.
• At each node, an ant chooses the next hop based on information locally available in the node and its own private information.
• After arriving at the destination, the forward ant trasfers its accumulated knowledge to the backward ant, and the backward ant begins its trip along the same route going in the opposite direction towards the source node.
• During the backward travel, the backward ant updates the routing table in each node based on the information collected during the forward trip.
• After the backward ant reaches the source node, it is destroyed.

The technical details of the algorithm are further discussed in the following sections.

##### Model of a network node
Model of a node in AntNet[2]

In a network consisting of ${\displaystyle N}$ nodes, each node ${\displaystyle k}$ maintains a routing table ${\displaystyle {\mathcal {T}}_{k}}$. The routing table stores probabilities describing the current routing policy at node ${\displaystyle k}$. Each entry in the table indicates the likelihood ${\displaystyle P_{n,d}}$ of selecting neighbour ${\displaystyle n}$ as the next node given the destination node ${\displaystyle d}$, where:

${\displaystyle P_{d}=\sum _{n\in {\mathcal {N}}_{k}}P_{n,d}=1,\qquad d\in \lbrack 1,N\rbrack ,\qquad {\mathcal {N}}_{k}={neighbours(k)}}$

The raw value ${\displaystyle P_{n,d}}$ is not used as-is for path selection, as there are some heuristic factors to be considered that are described in the following sections. Data packets use the same routing table ${\displaystyle {\mathcal {T}}_{k}}$ for selecting the next link, but use a power function ${\displaystyle g(P)=P^{\beta },\quad \beta >1}$ that amplifies the high probability values and suppresses the low probability values, because data packets only need to exploit the information and should not explore.

Each node ${\displaystyle k}$ also maintains another table ${\displaystyle {\mathcal {M}}_{k}}$ of statistics about the current locally-observed traffic. More precisely, ${\displaystyle {\mathcal {M}}_{k}}$ stores the mean travel time ${\displaystyle \mu _{k,d}}$ from current node ${\displaystyle k}$ to the destination node ${\displaystyle d}$, the variance ${\displaystyle \sigma _{k,d}^{2}}$, and an array ${\displaystyle {\mathcal {W}}_{d}}$ storing the travel times for the last ${\displaystyle w=\vert {\mathcal {W}}\vert }$ trips, used to compute the best travel time ${\displaystyle b_{k,d,W}}$ observed during the moving observation window. The mean travel time provides an estimate of the expected time to reach the destination, and the variance indicates the reliability of this estimate. Whenever there is new data available - i.e. travel time experienced by a newly arriving backward ant - the array ${\displaystyle {\mathcal {M}}_{k}}$ is updated with the following (the index ${\displaystyle k}$ will be omitted as all updates are local to node ${\displaystyle k}$):

${\displaystyle \mu _{d}\leftarrow \mu _{d}+\eta (t_{d}-\mu _{d})}$
${\displaystyle \sigma _{d}^{2}\leftarrow \sigma _{d}^{2}+\eta ((t_{d}-\mu _{d})^{2}-\sigma _{d}^{2})}$
where ${\displaystyle t_{d}}$ is the newly observed trip time of the agent going from node ${\displaystyle k}$ to destination ${\displaystyle d}$.

The term ${\displaystyle \eta }$ is analogous to the constant ${\displaystyle \alpha _{k}}$ in the Temporal Difference formula ${\displaystyle A_{k}=A_{k-1}+\alpha _{k}(v_{k}-A_{k-1}),\alpha _{k}={\frac {1}{k}}}$ for computing the running average over ${\displaystyle k}$ samples. The moving observation window ${\displaystyle {\mathcal {W}}_{d}}$ is updated by shifting it and pushing the new trip time ${\displaystyle t_{d}}$ and the best trip time ${\displaystyle b_{d,W}}$ is recomputed with ${\displaystyle b_{d,W}\leftarrow max({\mathcal {W}}_{d})}$. The moving observation window ${\displaystyle {\mathcal {W}}_{d}}$ serves as a short-term memory providing a recent empirical lower bound of the estimated travel time to the destination.

The table ${\displaystyle {\mathcal {M}}_{k}}$ provides an estimate of the travel time to different nodes, and table ${\displaystyle {\mathcal {T}}_{k}}$ describes the "goodness" of each link for a given destination.

##### Model of a mobile agent
Buffers inside a node in AntNet[2]

Given the model of the nodes, each node ${\displaystyle s}$ generates a forward ant ${\displaystyle F_{s\rightarrow d}}$ at regular intervals ${\displaystyle \Delta t}$, travelling from the current (source) node ${\displaystyle s}$ to destination node ${\displaystyle d}$. In order for a forward ant to experience the real traffic load, they are pushed into the same queue as the outgoing data packets. Since the queue is FIFO (First in, first out), if there are data packets waiting to be sent, the forward ant's departure will also be delayed proportionally to the size of the pending queue; this allows information about the congestion to be carried implicitly by the mobile agents. The destination node ${\displaystyle d}$ is selected probabilistically according to the locally observed traffic. The probability ${\displaystyle y_{d}}$ of selecting destination ${\displaystyle d}$ is proportional to the traffic towards node ${\displaystyle d}$, such that:

${\displaystyle y_{d}={\frac {f_{s,d}}{\sum _{d'=1}^{N}f_{s,d'}}}}$
where ${\displaystyle f_{s,d}}$ is some measure (e.g. size of the queue) of the data from node ${\displaystyle s}$ to ${\displaystyle d}$.

During its trip, an ant maintains a record of the traversed path, storing the identifier of a visited node ${\displaystyle k}$ and the time taken to reach that node ${\displaystyle k}$ since its departure from node ${\displaystyle s}$. This information is stored in a stack ${\displaystyle S_{s\rightarrow d}}$ and is part of the network packet representing the ant. Therefore the size of the ant increases linearly with the number of hops, but this increase has minimal effect on the performance of the algorithm.

For each node ${\displaystyle k}$ a forward ant visits along its route, it selects the next node ${\displaystyle n}$ among the neighbours it did not already visit. It picks a node based on a corrected probability ${\displaystyle P'_{n,d}}$ computed as a function of the raw probability ${\displaystyle P_{n,d}}$ stored in the routing table ${\displaystyle {\mathcal {T}}_{k}}$ and the state of the queue at the outgoing link to node ${\displaystyle n}$. The new probability ${\displaystyle P'_{n,d}}$ takes into account a heuristic correction factor ${\displaystyle l_{n}}$ that reflects the relative size of the queue of the outgoing link to node ${\displaystyle n}$:

${\displaystyle l_{n}=1-{\frac {q_{n}}{\sum _{n'=1}^{\vert {\mathcal {N}}_{k}\vert }q_{n'}}}}$
${\displaystyle P'_{n,d}={\frac {P_{n,d}+\alpha l_{n}}{1+\alpha (\vert {\mathcal {N}}_{k}\vert -1)}}}$

The heuristic correction factor ${\displaystyle l_{n}}$ indicates how congested the outgoing link is, and since ${\displaystyle l_{n}}$ is inversely proportional to the size of the queue, the corrected probability ${\displaystyle P'_{n,d}}$ becomes smaller than the raw probability ${\displaystyle P_{n,d}}$ if the corresponding outgoing link is relatively congested. This allows packets to take alternative routes in the case the theoretical best path has too much traffic. The constant ${\displaystyle \alpha }$ determines the importance of the correction factor ${\displaystyle l_{n}}$, and experiments show that setting the value of ${\displaystyle \alpha }$ between the ranges 0.2 and 0.5 yields good results. Setting higher values of ${\displaystyle \alpha }$ leads to oscillatory behaviour in the route selection, as the decision becomes too sensitive to the traffic.

If a forward ant at node ${\displaystyle k}$ has already visited all the neighbours of ${\displaystyle k}$, it has run into a circular path. In this case, records of the nodes in the cycle are popped from the ant's stack ${\displaystyle S_{s\rightarrow d}}$. If the duration of the cycle is longer than the time taken to reach the cycle (elapsed time before the ant entered the circular path), the ant is destroyed.

When a forward ant reaches the destination, it is converted to a backward ant ${\displaystyle B_{d\rightarrow s}}$, retaining the memory stack ${\displaystyle S_{s\rightarrow d}}$ (in the original paper, it is said that the forward ant transfers its memory to a new backward ant and dies[1]). The backward ant travels along the same path it (forward ant) came from, towards the source node ${\displaystyle s}$. It traces the path by popping the stack ${\displaystyle S_{s\rightarrow d}}$ at each node ${\displaystyle k}$, and travels through a priority queue at the outgoing link. The backward ants reserve this privilege so that fresh information about the route can be propagated quickly.

Backward ants implement the concept of stigmergy, an indirect way of communicating with other ants by modifying the environment. In AntNet, at each node ${\displaystyle k}$ along the path the backward ants update the values in the table ${\displaystyle {\mathcal {M}}_{k}}$ with new observations about the network status and modify the routing table ${\displaystyle {\mathcal {T}}_{k}}$ accordingly; this is analogous to ants leaving pheromone traces. The new information given by the backward ants act as reinforcement signals to push the routing policy towards a local optimum.

In the tables ${\displaystyle {\mathcal {M}}_{k}}$ and ${\displaystyle {\mathcal {T}}_{k}}$, all the entries corresponding to the destination node ${\displaystyle d}$ are updated. Additionally, the entries belonging to nodes in between node ${\displaystyle k}$ and destination ${\displaystyle d}$ - i.e. node ${\displaystyle n\in S_{k\rightarrow d},\quad n\neq d}$ - are updated, provided that the trip times for these "sub-paths" are statistically "good". Statistically "good" means that the elapsed time is less than ${\displaystyle \mu +I(\mu ,\sigma )}$, the sum of the mean trip time and the confidence interval ${\displaystyle I(\mu ,\sigma )}$. The nodes between node ${\displaystyle k}$ and destination ${\displaystyle d}$ are potential destinations for other ants and therefore updating these "sub-paths" provide information for other source-destination pairs at zero cost.

The mean trip time ${\displaystyle \mu _{k,d}}$, variance ${\displaystyle \sigma _{k,d}^{2}}$, and best trip time ${\displaystyle b_{k,d,W}}$ during observation window ${\displaystyle {\mathcal {W}}_{d}}$ are updated in table ${\displaystyle {\mathcal {M}}_{k}}$ as described above, using information stored in the ant's memory stack ${\displaystyle S_{s\rightarrow d}}$. Given that the backward ant just arrived from node ${\displaystyle f}$, the routing table ${\displaystyle {\mathcal {T}}_{k}}$ is updated by incrementing the probability ${\displaystyle P_{f,d}}$ - i.e. the probability of choosing node ${\displaystyle f}$ as the next node when destination is ${\displaystyle d}$ - by an amount proportional to a "goodness" measure ${\displaystyle r}$ of the trip time ${\displaystyle t_{k\rightarrow d}}$. The other probabilities ${\displaystyle P_{n,d},\quad n\neq f}$ are decremented proportionally as well.

${\displaystyle P_{f,d}\leftarrow P_{f,d}+r(1-P_{f,d})}$
${\displaystyle P_{n,d}\leftarrow P_{n,d}+rP_{n,d}}$

Observed trip time ${\displaystyle t_{k\rightarrow d}}$ is the only explicit feedback signal used to evaluate the "goodness" measure ${\displaystyle r}$, since it implicitly captures information about the physical properties of the path such as transmission capacity of the links, processing speed of the nodes, number of hops; as well as dynamic information about the system like the congestion level. Evaluating the "goodness" of trip time ${\displaystyle t_{k\rightarrow d}}$ is not a trivial task, since there is no objective "optimal" trip time to compare against. An apparently poor trip time ${\displaystyle t_{k\rightarrow d}}$ under low traffic could actually be a good one under heavy traffic, so we cannot compute some kind of an error signal like we do in back-propagation (in neural nets). The "goodness" measure ${\displaystyle r}$ can only be used as a reinforcement signal like a "reward" in the context of Reinforcement Learning. ${\displaystyle r\in (0,1]}$ is a dimensionless value and is critical to the AntNet algorithm and will be discussed in more detail below.

The first challenge in determining the reinforcement signal for the path from node ${\displaystyle k}$ to node ${\displaystyle f}$ is the credit assignment problem typically seen in RL. The only available metric is the elapsed time ${\displaystyle t_{k\rightarrow d}}$ between node ${\displaystyle k}$ to the destination node ${\displaystyle d}$ and we cannot know how much the sub-path between node ${\displaystyle k}$ and ${\displaystyle f}$ contributed to the elapsed time. The second challege is finding the right balance between adaptivity and stability. As mentioned in the previous paragraph, the "goodness" is relative to the current traffic conditions, so the reinforcement signal should be able to account for the network status. At the same time, it should not be overly sensitive to the fluctuations in traffic, as this can lead to oscillating routes. In addition, an important insight is gained by considering a simple reinforcement signal ${\displaystyle r=constant}$. In this case, all paths traversed by the ants are rewarded the same amount, independent of each ant's observed statistics. There is an implicit reinforcement mechanism at work, as faster links are traversed more often and accumulate rewards at a higher rate. The obvious drawback in this case is that bad paths are rewarded too. The authors' experiments show that even with a constant reinforcement signal the network performs moderately well, suggesting that this implicit reinforcement is an important component in the AntNet algorithm. To take into account the information collected by the ants, the feedback signal ${\displaystyle r}$ is computed as a function of the entries in the table ${\displaystyle {\mathcal {M}}_{k}}$ and is given by the following:

First we compute a quantity ${\displaystyle r'}$ using the equation (obvious indices are omitted for brevity)

${\displaystyle r'=c_{1}\left({\frac {t_{best}}{t}}\right)+c_{2}\left({\frac {t_{upper}-t_{best}}{(t_{upper}-t_{best})+(t-t_{best})}}\right)}$
where ${\displaystyle t_{upper}=\mu +{\frac {\sigma }{{\sqrt {1-\gamma }}{\sqrt {\vert {\mathcal {W}}\vert }}}}}$.

${\displaystyle t_{best}=b_{k,d,W}}$ is the best trip time observed in the moving window ${\displaystyle {\mathcal {W}}_{d}}$ as previously mentioned, and ${\displaystyle t_{upper}}$ is an approximation of the upper bound of the confidence interval of the average trip time ${\displaystyle \mu }$ with specified confidence level ${\displaystyle \gamma }$. ${\displaystyle \vert {\mathcal {W}}\vert }$ is the cardinality of the observation window ${\displaystyle {\mathcal {W}}_{d}}$. Using only the upper bound of the confidence interval introduces some kind of asymmetry and inaccuracy in this equation, but the authors claim that it is practical enough considering the extra CPU consumption that would be spent on computing an exact quantity. The first term in the above equation is the ratio between the best trip time and the currently observed trip time. It represents how "good" the current record is relative to the best record seen recently and it is more important than the latter term. The second term is a correction term, evaluating how different the observed trip time is from the average time, taking into account the stability of the value of the average time. In the presented implementation of AntNet, ${\displaystyle c_{1}=0.7}$ and ${\displaystyle c_{2}=0.3}$. Experiments show that setting ${\displaystyle c_{2}\in [0.15,0.35]}$ yields good results.

Then quantity ${\displaystyle r'}$ is then transformed into ${\displaystyle r}$ by using a squash function ${\displaystyle s(x)}$:

${\displaystyle s(x)=\left(1+e^{\frac {a}{x\vert {\mathcal {N}}_{k}\vert }}\right)^{-1},\qquad x\in (0,1],a\in R^{+}}$
${\displaystyle r={\frac {s(r')}{s(1)}}}$

Using the squashing function has an effect of amplifying the high values of ${\displaystyle r}$ while saturating low values of ${\displaystyle r}$, so that "rewards" play a bigger role than "punishments". Using the final value ${\displaystyle r}$, the routing table is updated as previously mentioned:

${\displaystyle P_{f,d}\leftarrow P_{f,d}+r(1-P_{f,d})}$
${\displaystyle P_{n,d}\leftarrow P_{n,d}+rP_{n,d}}$

The authors' experiments indicate that the algorithm is robust to its internal parameter settings and AntNet performs well as long as the paramaters are set to "reasonable" values.

### Analysis

Dhillon and Van Mieghem have produced a comprehensive analysis of the AntNet algorithm[2], empirically comparing its performance against existing state-of-the-art routing protocols such as OSPF and against Dijkstra's shortest path algorithm, a theoretical optimum. The study shows that AntNet's performance is comparable to Dijkstra's algorithm and is robust under varying traffic loads.

#### Computational Complexity

To calculate the complexity of the AntNet algorithm, the complexity of a single ant is first considered. At every node along its path, it needs to compare its memory of the path traversed so far with the neighbours of the node to determine if it has visited some of the nodes before; this is done in ${\displaystyle O(1)}$. After this comparison, it needs to look through the link values in the routing table to select the best neighbour to move to. Assuming a fully connected network where each node has ${\displaystyle N_{k}}$ neighbours, this takes ${\displaystyle O(N_{k})}$. Then the ant updates its memory with the identifier of the node, taking ${\displaystyle O(1)}$ and then is pushed to the outgoing link queue in ${\displaystyle O(1)}$. With ${\displaystyle M}$ being the maximum hop count of a forward ant, the ant has to perform the described computation at each of the ${\displaystyle M}$ nodes. On its way back, at each node the backward ant pops its stack, taking ${\displaystyle O(1)}$, and is pushed to the queue in ${\displaystyle O(1)}$. Updating the routing table requires computing a value for every neighbour of the node, which takes ${\displaystyle O(N_{k})}$. This operation is also done for every hop, and thus the backward ant also has complexity of ${\displaystyle O(MN_{k})}$. Therefore the complexity of a single ant is ${\displaystyle O(MN_{k})}$. For the entire system, given that there are ${\displaystyle \rho }$ ants generated, the worst-case complexity of the AntNet algorithm is ${\displaystyle O(\rho MN_{k})}$. Given that ${\displaystyle M}$ and ${\displaystyle N_{k}}$ are relatively small, this is comparable to the worst-case complexity of Dijkstra's shortest path algorithm using a Fibonacci heap, which is ${\displaystyle O(Nlog(N)+L)}$.

#### Experiments

The authors evaluate the AntNet algorithm under static and dynamic network conditions by comparing the performance against solution found by Dijkstra's shortest path algorithm. The term static here does not have the same connotation as in static analysis; it means that the traffic status and varying size of the packets are not taken into account. More precisely, in the static evaluation it is assumed that the forward ants are only affected by the propagation delays between the links - i.e. time taken for a single bit to travel from one node to another - while queueing delays, transmission delays, and link capacity are ignored. This reduces the node selection probability ${\displaystyle P'_{n,d}}$ to the raw values in the routing table ${\displaystyle P_{n,d}}$. In the dynamic evaluation, all the factors used in the original implementation of AntNet are taken into account. It is important to note that while Dijkstra's algorithm can easily be compared with the static version of AntNet, it is more tricky to compare against the dynamic AntNet. The solution given by Dijkstra's algorithm is the sum of all the link weights (propagation delays) along the path, whereas the solution given by dynamic AntNet includes the transmission and queueing delays. To make a fair comparison between the two, transmission and queueing delays are excluded from the solution found by AntNet and only link weights are considered.

The experiments are carried out in a simulated environment, on random graphs ${\displaystyle G_{p}(N)}$ with ${\displaystyle N}$ nodes and link density ${\displaystyle p}$. In each simulation, ${\displaystyle 10^{4}}$ to ${\displaystyle 10^{5}}$ graphs are assessed. The upper bound ${\displaystyle m_{max}}$ for the average number of paths between a pair of nodes in graph ${\displaystyle G_{p}(N)}$ is given by: ${\displaystyle m_{max}=\left\lbrack p^{N-1}e^{\frac {1}{p}}(N-2)!\right\rbrack }$ The propagation delays for the links are set uniformly randomly between ${\displaystyle (0,1]ms}$ and link capacity is set to 8.192 Mbit/s. Each simulation runs for ${\displaystyle 10^{4}ms=10s}$ and consists of a training period (TP) of ${\displaystyle 10^{3}ms=1s}$ during which only ant packets are generated, and a test period (TEP) of ${\displaystyle 9\times 10^{3}ms=9s}$ during which both ant and data packets are generated. Data packets are generated from nodes selected by Poisson process with mean interarrival time of 12.5ms, headed towards a randomly selected destination. The size of data packets are also randomly set, following a negative exponential distribution with mean value of 4096 bits. The size of a forward ant is initialized at 192 bits and grows by 64 bits per hop, while backward ants are assumed to be 500 bits. The default value of ${\displaystyle a}$ in the squash function ${\displaystyle s(x)}$ is set to ${\displaystyle a=pN}$. For the comparison between AntNet and Dijkstra's algorithm, source node ${\displaystyle s=1}$ and destination ${\displaystyle d=N}$ are considered. While there is just one shortest-path given by Dijkstra's algorithm, in AntNet the data packets travel probabilistically via different routes between a given source node and a destination node. Accordingly, the average end-to-end delay and hop count of the paths taken by the data packets over the whole simulation are used for the evaluation. The effects of changing other configuration parameters such as ${\displaystyle \eta }$, ${\displaystyle \vert {\mathcal {W}}\vert }$ are studied in various simulations in the experiment.

##### Static AntNet

For evaluating AntNet under static conditions - i.e. without considering traffic status - the algorithm is modified to use ${\displaystyle \epsilon -greedy}$ policy for a forward ant choosing the next node. This is because the correction term in ${\displaystyle P'_{n,d}}$ vanishes, preventing the ants from exploring the network. In this simulation the ant generation interval ${\displaystyle I_{ant}}$ is set to 4ms during TP (Training Period) and 40ms during TEP (Test Period), ${\displaystyle \eta =0.1}$, ${\displaystyle \vert {\mathcal {W}}\vert =50}$, and link density ${\displaystyle p=0.2}$. The simulation is carried out for network sizes of ${\displaystyle N=25}$ and ${\displaystyle N=50}$, each simulation consisting of ${\displaystyle 10^{4}}$ iterations of random graphs of type ${\displaystyle G_{p}(N)}$. The effect of ${\displaystyle \epsilon }$ is also observed by setting it to ${\displaystyle \epsilon =0}$ and ${\displaystyle \epsilon =0.1}$. A special case of uniform AntNet, referred to as unfNet for the rest of the discussion, is also considered, where ${\displaystyle \epsilon =1}$ (only explore).

Probability density function of average link weight and hop count for Dijkstra's shortest path, unfNet, and AntNet for varying network size and link density[2]

Static implementation of AntNet is shown to converge to a good solution, performing better when ${\displaystyle \epsilon >0}$. In fact, unfNet is shown to perform better than regular AntNet as all paths are explored with equal probabilities.

##### Dynamic AntNet

In the evaluation of dynamic implementation of AntNet, effects of different parameters are studied, such as ant generation rate, squash function, observation window, and the power function used in the forwarding policy of data packets.

###### Effect of ant generation interval ${\displaystyle I_{ant}}$ and link density ${\displaystyle p}$

In the first analysis, three different values of ant generation interval ${\displaystyle I_{ant}}$ are used: 40ms, 4ms, and 0.4ms. These values are applied only during TP, and the generation interval is fixed at 40ms during TEP. Two different values of link density ${\displaystyle p}$ are used: 0.1 and 0.2.

The results indicate that AntNet yields near optimal solution at low values of ${\displaystyle p}$, but performance degrades as ${\displaystyle p}$ is increased. Performance is slightly worse in the larger network with ${\displaystyle N=50}$ compared to the network with ${\displaystyle N=25}$. In regards to the ant generation rate, AntNet converges to a good solution even at low ant generation rates - i.e. high generation interval ${\displaystyle I_{ant}}$. When ${\displaystyle I_{ant}=40ms}$, although only 25 forward ants are sent out per node during TP, the algorithm finds a good solution. Performance improves as the ant generation rate is increased, but no difference is observed between ${\displaystyle I_{ant}=4ms}$ and ${\displaystyle I_{ant}=0.4ms}$. This can be attributed to the correlation between the size of the observation window ${\displaystyle \vert {\mathcal {W}}\vert }$ and the ant generation rate. With ${\displaystyle \vert {\mathcal {W}}\vert =50}$, the observation window is not large enough to capture the effect of changing ${\displaystyle I_{ant}=4ms}$ to ${\displaystyle I_{ant}=0.4ms}$. Additional simulations with ${\displaystyle \vert {\mathcal {W}}\vert =200}$ and ${\displaystyle \eta =0.02}$ confirm this idea. At low ant generation rate, unfNet performs worse than AntNet since there are not enough ants to discover the shortest path in the given time. At higher ant generation rates, the performance of AntNet and unfNet are comparable.

###### Effect of moving observation window ${\displaystyle {\mathcal {W}}}$ and ${\displaystyle \eta }$

It is worth reiterating that the parameter ${\displaystyle \eta }$ used in computing the running average ${\displaystyle \mu }$ of the trip time and the size of the moving observation window ${\displaystyle {\mathcal {W}}}$ are conceptually coupled. The original AntNet uses the relation ${\displaystyle \vert {\mathcal {W}}\vert ={\frac {5c}{\eta }},\quad c<1}$. It is hard to determine the optimal size of the moving observation window since its effect is correlated with the ant generation interval. Intuitively, it would make sense to keep an observation window large enough to store diverse samples of the paths, but it cannot be guaranteed that different ants would be exploring distinct and unique paths within that window. Results show that increasing the size of the observation window with a fixed ${\displaystyle \eta }$ improves the performance slightly. At light traffic loads, ${\displaystyle \eta }$ does not influence the performance much, as the mean ${\displaystyle \mu }$ converges quickly. ${\displaystyle \eta }$ would play a more significant role when there are fluctuations in the traffic loads. In order to observe the combined effect of ${\displaystyle {\mathcal {W}}}$, ${\displaystyle \mu }$, and ${\displaystyle I_{ant}}$, an additional simulation is performed with extreme values of the parameters. Ant generation interval ${\displaystyle I_{ant}}$ is set to 0.4ms during TP and 4ms during TEP, ${\displaystyle \vert {\mathcal {W}}\vert =16000}$, and ${\displaystyle \eta =0.001}$. The size of the observation window ${\displaystyle {\mathcal {W}}}$ is large enough to store the trip times of all the forward ants generated by the node as well as other incoming ants.

###### Effect of confidence interval, squash function, and power function

The confidence interval of the mean trip time ${\displaystyle \mu }$ and the squash function ${\displaystyle s(x)}$ are used in the calculation of the reinforcement signal ${\displaystyle r}$. The confidence interval term ${\displaystyle {\frac {\sigma }{{\sqrt {1-\gamma }}{\sqrt {\vert {\mathcal {W}}\vert }}}}}$ is mainly used to determine the reliability of the mean trip time ${\displaystyle \mu }$ when comparing it with the newly observed trip time. The simulations show that this term has very little effect on the performance of AntNet and can even be removed to simplify the AntNet algorithm. Using larger values of ${\displaystyle a}$ in the squash function ${\displaystyle s(x)=\left(1+e^{\frac {a}{x\vert {\mathcal {N}}_{k}\vert }}\right)^{-1},\quad x\in (0,1],a\in R^{+}}$ is shown to improve performance, demonstrating the effect of amplifying the reinforcement signals. Finally, the power function ${\displaystyle g(P)=P^{\beta }}$ is used to determine the routes for the data packets. Setting ${\displaystyle \beta =1}$ implies that the routing decision is dictated by the values in the routing table, leading to data packets taking different sub-optimal paths with a non-zero probability. Setting a very large value for ${\displaystyle \beta }$ causes the data packets to always travel through a single best path as seen locally by the node. Results show that AntNet performs better with ${\displaystyle \beta =100}$ indicating that the shortest path was correctly discovered.

###### Behaviour under varying traffic

To observe the behaviour of AntNet under varying traffic, randomly chosen nodes are congested during each iteration of the simulation by adding a queueing delay of 10ms per packet. Since it is not possible to account for traffic in Dijkstra's algorithm, one version of AntNet "src_rt" is configured to send out data packets through the shortest path found by Dijkstra's algorithm. Then the end-to-end delays measured in "src_rt" and regular AntNet are compared to see whether the theoretical shortest path is actually the shortest when there are congested nodes along its path. Results show that AntNet outperforms "src_rt" in terms of end-to-end delay, indicating that AntNet was able to find alternative routes that do not have congested nodes. When there are no congested nodes, "src_rt" and AntNet's performance are similar. This load balancing capability of AntNet is demonstrated further by running a slightly different experiment. Five randomly chosen nodes are congested 5000ms after the start of the simulation period, and average packet delay and hopcount are computed over a moving window of 250ms and plotted. AntNet's performance is constant before the 5000ms mark, until there is a sudden increase in the delay at the point of congestion. The delay is quickly reduced in AntNet, showing that it has found an alternative route, whereas "src_rt" suffers from the congestion.

End-to-end delay and hop count of AntNet versus shortest path found by Dijkstra's algorithm under varying traffic conditions[2]
Path delay and hop count over time for AntNet and "src_rt", illustrating the load balancing capabilities of AntNet[2]

#### Discussion

Various simulations were performed to examine the different features of AntNet, and results show that AntNet is able to discover near optimal solutions quickly even with a limited number of ants. There are several configuration parameters that are correlated, where changing a single parameter has little influence on the performance unless other coupled parameters are together adjusted. More importantly, it was shown that the theoretical shortest path found by Dijkstra's algorithm may be sub-optimal in the case of congestion, and that AntNet was able to quickly find alternative routes for the data packets. Thus AntNet provides inherent load balancing capabilities and is robust to changing traffic conditions, making it an attractive routing protocol. However, with larger and denser networks, AntNet's performance degraded, suggesting it may require a longer training period. The assessment on the scalability of AntNet algorithm and its applicability in ad hoc networks remains as future work.

## Annotated Bibliography

1. Di Caro, G. and Dorigo, M., 1998. AntNet: Distributed stigmergetic control for communications networks. Journal of Artificial Intelligence Research, 9, pp.317-365.
2. Dhillon, S.S. and Van Mieghem, P., 2007. Performance analysis of the AntNet algorithm. Computer Networks, 51(8), pp.2104-2125.
3. Bianchi, Leonora; Marco Dorigo; Luca Maria Gambardella; Walter J. Gutjahr (2009). "A survey on metaheuristics for stochastic combinatorial optimization". Natural Computing: an international journal. 8 (2): 239–287.
4. M. Dorigo. 2007. Ant Colony Optimization. Scholarpedia.org
5. Schrijver, Alexander (February 1, 2006). A Course in Combinatorial Optimization
6. Bertsekas, D.P., Gallager, R.G. and Humblet, P., 1992. Data networks (Vol. 2). New Jersey: Prentice-Hall International.
7. Di Caro, G. and Dorigo, M., 1997. AntNet: A mobile agents approach to adaptive routing. Technical Report IRIDIA/97-12, IRIDIA, Université Libre de Bruxelles, Belgium.

 Permission is granted to copy, distribute and/or modify this document according to the terms in Creative Commons License, Attribution-NonCommercial-ShareAlike 3.0. The full text of this license may be found here: CC by-nc-sa 3.0