Course:CPSC522/Online Pattern Analysis by Evolving SelfOrganizing Maps
Online pattern analysis by evolving selforganizing maps
The evolving selforganizing map enhances the original selforganizing map by taking inspiration from the work on growing neural gas, which explores an implementation of a model that learns topology of the input without a predetermined output model size.
 Fritzke, Bernd (January 1994). "A Growing Neural Gas Model Learns Topologies" (PDF). Proceedings of the 7th International Conference on Neural Information Processing Systems: 625–632.
 Deng, Da; Kasabov, Nikola (April 2003). "Online pattern analysis by evolving selforganizing maps". Neurocomputing. 51: 87–103.
Principal Author: Svetlana Sodol
Abstract
This page describes two papers that propose models that incrementally grow in size the output models they produce  growing neural gas and evolving selforganizing map. The latter is inspired by the first and the papers presented are separated in time by almost 10 years. The evolving selforganizing map borrows the basic structure and approach of the growing neural gas while introducing changes like making insertion of new nodes faster and keeping track of a smaller number of model parameters. The evolving selforganizing map work also provides more context for usage of such an incremental learning model in an online environment for classification and clustering.
Builds on
The concepts in this page are based on models and algorithms of SOM, neural gas, competitive Hebbian learning and vector quantization.
Related Pages
Some algorithms that are similar in nature or tasks are PCA, kmeans and neural networks.
Content
Introduction
The paper "Online pattern analysis by evolving selforganizing maps" attempts to solve the problem of processing online data streams. This could be a classification or a clustering task of a dataset that is presented to the model example by example, not all together. The inputs and outputs might come from unknown distributions and change over time. This means that the parameters of any model need to be updated incrementally with limited memory resources. Authors point out that this requires efficiency and flexibility and thus many of the models that suit these tasks are neural network models. Previously used models are critiqued as having predetermined sizes so there is open demand for models that do not need their sizes predetermined before any data is seen. An evolving model titled "growing neural gas" (GNG) has been proposed by Fritzke^{[1]} and was based on the model of the "neural gas" (NG) , which is a very similar concept to the SOM. This showed that fast incremental learning of not only the structure of the output model but also the size of it, is possible and so the growing models of Fritzke are taken as the inspiration to the evolving selforganizing map (ESOM)^{[2]}. The fact that these algorithms all go through the input examples in succession allows this to be used for streaming or online settings. Although this does not limit the usage to only the online realm, as any dataset can usually be streamed as was done for many benchmarks that will be described.
In the following sections, first, the initial paper by Fritzke on the growing neural gas will be described. Then, the more recent paper on the evolving SOM's. The last section will outline some of the comparisons and critiques of the two papers, as well as provide a bigger picture on the contributions.
Growing Neural Gas^{[1]}
Introduction
"A growing neural gas model learns topologies" paper presents an unsupervised learning approach to solving dimensionality reduction. The main idea of dimensionality reduction is described by the authors as "finding low dimensional subspace of the input vector space containing most or all of the input data”.
Some already existing solutions are to use direct methods like PCA or iterative methods like selforganizing maps (SOM)^{[3]} and growing cell structures (GCS)^{[4]}. However the problem with these methods is that the dimension size of the output feature map for SOM or GCG (or the number of principle components for PCA) have to be specified ahead of time. There has been another method titled "neural gas" (NG) described in 1991, that uses competitive Hebbian learning (CHL) to find topological structure that closely reflects how the input data is distributed. This paper proposes a new network model that also uses CHL and is based on NG, but is incremental and only has constant parameters, thus offering advantages over other models.
Background
The competitive Hebbian learning (CHL) principle is “for each input signal x connect the two closest centres (measured by Euclidean distance) by an edge”. The centres here are like the cluster centres in kmeans  they form the actual output model that we want to produce, to represent the distribution of the input data.
We start by assuming a certain number of centres in and then iteratively insert topological connections  edges of the resulting network. The resulting graph optimally preserves topology in a very general sense. This is titled "induced Delaunay triangulation". Since the resulting graph is limited to where the input distribution is greater than zero, for most usefulness we should only place the centres at these locations by using vector qunatization (VQ).
The neural gas (NG) method is a form of VQ. The principle is as follows: “for each input signal x adapt the k nearest centers whereby k is decreasing (through time) from a large initial to a small final value”. This allows to move the nodes of the resulting network closer to the places of interest where the input data is actually coming from.
NG and CHL can be run in succession or concurrently, as although CHL does not influence NG, NG influences CHL. Where we have the centres matters for which centres we find to be the closest to the input data and therefore which we connect to form the topology. Although this has been shown effective in topology learning, in practice the problem occurs with knowing the correct number of centres ahead of time in order to have accurate learning.
Algorithm
In order to avoid specifying the final number of centres for the model, the authors introduce the "growing neural gas" algorithm, that successively adds new centres to an initially small network by evaluating local statistics. This is a similar approach to GCS^{[4]} but with dimensions dependent on input, which may vary locally.
Input: possibly infinite ndimensional data set of a certain distribution
Output: a network that consists of set of nodes (the centers), each node has reference vector in space, and unweighted edges among pairs of nodes – to define the topological structure
Some of the parameters that are important for the algorithm:
 "ages" for the connections that increment with each iteration the edge and its nodes have not been used and delete the edges if they are older than
 every node has its reference vector and an accumulated error
 parameter that controls how often we add new nodes to the network
 parameters (for the winner node) and (for the winner's neighbours) that control how these nodes' reference vectors are moved towards the incoming data
 parameter that controls how the new nodes' errors are initialized
 parameter that controls how much the accumulated error for all nodes decreases with each iteration
Algorithm:
 Start with 2 nodes at random positions in space
 Sample input distribution for an example
 Find the nearest 2 nodes to the input example  and
 Increment age of all edges from
 Update error accumulator for :
 Move and all its neighbors towards input example by , where the parameter is for and for rest
 If and are connected by an edge, set age to 0, otherwise create it
 Remove edges with age larger than , if nodes have no edges delete them
 Every input examples, add new node: Determine node with largest accumulator error; Insert new node halfway between node and its highest error neighbor node ; ; Connect to and , remove original edge between and ;
 Decrease all errors by multiplying with parameter
 Repeat from step 2 until done
Simulation Results
This GNG model was tested on a dataset used for original NG and the results are that GNG quickly learns important data topology structures.
This test had 60020000 inputs and the resulting network had 100 edges between nodes.
The 2^{nd} test compared GNG to NG. The final topology learnt was similar, with both models having 100 nodes after 10,000 steps. However, intermediate networks were different. In the image to the right you can see the snapshots of this evaluation at different stages of learning. The lefthand column is the NG model and the righthand is the GNG. The input data structure is the big cirles and the block in the centre, while the small cirles represent the model nodes and the lines are the connections. One can see how the GNG model can also take the shape of the input data distribution through time while increasing the number of used nodes. This image is taken from the paper by Fritzke^{[1]}.
Evolving SelfOrganizing Maps^{[2]}
Here we describe the second paper "Online pattern analysis by evolving selforganizing maps". The structure of the ESOM network is as in the GNG. It builds on the idea that learning the size of the topology network can be done incrementally. The main contribuion of this paper is the implementation of the evolving topology network and showing emperical results that this is competitive to other clustering and classification methods. It is importnant to note that the evaluations are not actually performed "online", the datasets that are used are fixed and are wellexlored. However, the datasets are treated as if they are online since they are provided to the model one by one.
ESOM
It starts with an empty network embedded in input space, new nodes are created incrementally and with each new input its 2 winner nodes get connected. If no nodes match with current input a new node is created.
The insertion of new nodes is different from the GNG to reduce the number of iterations needed to tune the new node to become a new cluster  instead of the insertion at halfway point, the new node is created at exactly the poorly matched input value.
To prevent this from introducing noise and artifacts, careful automatic deletion of obsolete nodes is completed. The main goal of this model is lifelong learning, so strict convergence is not critical. The learning rate is set constant, usually of 0.05, the forgetting constant (from Hebbian learning with forgetting) to 0.8 and the distance threshold that controls node insertion and thus cluster formation is determined by trial and error. This constant represents the maximum difference between a node and an input that can qualify it as a match. Another change from GNG is that there is no longer tracking of errors for all the nodes.
Algorithm (some of the variable names are changes to indicate the similarities to the GNG algorithm more explicitly):
1) Sample new input data vector
2) If there are no nodes go to INSERTION
3) MATCHING: find 2 bestmatching nodes and , if go to UPDATING, otherwise to INSERTION
4) INSERTION: create new node representing the input, connect with 2 best matching existing nodes
5) UPDATING: modify winners and and their neighbors that have positive connection edges with the learning rule of ESOM:
where
and update their edge connection strengths by:
6) DELETION: every steps prune the edge with the weakest strength
7) Repeat from step 1 until there is no more data
The time complexity of this is comparable to GCS/GNG and is more efficient than original NG.
Simulations
ESOM is shown to be competitive with other approaches to clustering and classification.
4 different simulations were performed to evaluate the ESOM model.
1) Colour image quantization as online data clustering
The goal of this task is to reduce number of colours in a digital colour image, this is closely related to image compression. The problem is usually that this is timeconsuming, and so an "online" solution would perform faster. ESOM was compared to the two basic approaches to this task as well as to kmeans and Wu's method on 3 images popular in the field. ESOM had the best results for 2 images and had close performance to Wu's method on the last one. It was also noted that ESOMproduced images have better visual quality.
2) 2spiral coordinate data classification
ESOM is unsupervised, but can be applicable in supervised setting for classification using the 1 nearest neighbor approach. In the 2spiral data set the inputs are coordinates. ESOM was shown to need the fewest number of centres to get 0% error rate (about 100 centres), and required only a single pass through data set. It was compared to GCS, LVQ and SOM.
3) UCI Pima Indian diabetes set
The data in this benchmark is personal patient data and a binary classification indicating if they are positive for diabetes or not. Training was done with 10fold cross validation, with data presented 5 times in random order, and average performance was compared. Compared to knearest neighbors, class/regression trees, MLP, LVQ and LDA, ESOM had pretty similar results. As compared to GNG/GCS, the ESOM only required one pass through the data for similar accuracy, however, GNG needed less nodes (but more passes through the data).
4) CMU vowels data set
The input data for this set are 990 frames of speech signals, from 4 males and 4 females. The training was performed offline to set the parameters. It was then done online and compared to perceptrons, nearest neighbours methods and GGS. ESOM required the least number of weights for similar performance, and only single pass learning.
Conclusions
ESOM can be extended to continue training online (as new examples keep coming in): authors predict that this will reduce the number of errors on diabetes and the vowels sets. Better or similar performance was shown on the benchmarks, with many advantages like singlepass learning. Compared to regular SOM, the results are better, with less time needed to train, and more visual clues are present in the resulting network, as better clustering/proximity is found. However, visualization is not usually straight forward as the output is in the original input space.
Future directions are outlined to be extending ESOM for use in information retrieval through threshold scales  doing hierarchical feature maps, using aggregation to reduce the number of nodes and selftuning of system parameters.
Possible applications are multimedia information retrieval and online financial data analysis.
Discussion
Contributions
The authors of the "Online pattern analysis by evolving selforganizing maps" paper are very explicit about the similiarities and differences with the GNG model and algorithm, although they use sometimes different terminology and notation. The main idea is to use the concept that online learning needs incremental change in the network and the fact that this is shown to work with NG already. So they try a very similar thing with a SOM, to increase its fit to the online learning domain, where the input data is not present for preconfiguring of the model space. This is not a very straight forward or easy extension, as there are many changes and they also talk a lot about NG and GCS and some other models that share some aspects with ESOM. It seems like the authors are putting together little things from many already explored and implemented places and apply to a somewhat new setting.
Evaluations
For the evaluation of the ESOM model, some results to compare to are taken from other papers, not reproduced for this study. This is a weak point about the evaluation, as there might be some important details missing in the other implementations that could impact these comparisons.
However, ESOM is tested in much more detail than as the GNG presented in "A Growing Neural Gas Model Learns Topologies", with more sets, compared with more algorithms, on more diverse data. This might be the result of more work being done in this field and simply more available data available that is freely accesible and has already became a benchmark in the field.
Improvements
For the first paper there are still many unanswered questions. At one point the authors mention that "reversible mapping does not exist in general, so some information can be lost"^{[1]}. They go on to talk about topology learning from the persepctive of finding out what structures can be mapped reversably. This is however not brought up into the discussion again and remains unswered. This, however, is very important for talking about accuracy of topology learning models  a low accuracy on a less "reversable" dataset might be the best any model can do, even though it is not all that well. Some theoretical proofs of performance and how much topology can be saved for sure in these methods would also be interesting to see.
Some other things missing from the first paper are how exactly the GNG model gets locally different dimensions dependent on the data. This is a very interesting concept but it is not extended in the second paper at all. The emperical evaluation of the GNG in the original paper is also small, it was done only on 2 simple datasets and thus many of the promised properties were not shown emperically.
Both papers would benefit from including more background and discusion on parameters and their tunings, their effects on speed and accuracy.
The papers that were chosen do not seem to be the final result of working on these types of problems and models, but only the introductions. More follow ups exist with further extensions, comparisons and testings. This highlights that research is often a work in progress and there is always room for improvements.
Annotated Bibliography
 ↑ ^{1.0} ^{1.1} ^{1.2} ^{1.3} Fritzke, Bernd (January 1994). "A Growing Neural Gas Model Learns Topologies" (PDF). Proceedings of the 7th International Conference on Neural Information Processing Systems: 625–632.
 ↑ ^{2.0} ^{2.1} Deng, Da; Kasabov, Nikola (April 2003). "Online pattern analysis by evolving selforganizing maps". Neurocomputing. 51: 87–103.
 ↑ Kohonen, T. "Selforganized formation of topologically correct feature maps". Biological Cybernetics. 43: 59–69.
 ↑ ^{4.0} ^{4.1} Fritzke, B (1994). "Growing cell structures  a selforganizing network for unsupervised and supervised learning". Neural Networks. 7 (9): 1441–1460.
To Add
