On-line Pattern Analysis by Evolving Self-organizing Maps

From UBC Wiki
Jump to navigation Jump to search

On-line pattern analysis by evolving self-organizing maps

One sentence summary

Principal Author: Svetlana Sodol


This should be a brief summary that tells us what is covered in this page.

Builds on

Put links to the more general categories that this builds on. These links should all be in sentences that make sense without following the links. You can only rely on technical knowledge that is in these links (and their transitive closure). There is no need to put the transitive closure of the links.

SOM, neural gas

Related Pages

This should contain the reverse links to the "builds on" links as well as other related pages. Use in sentences.


Growing Neural Gas


"A growing neural gas model learns topologies" paper presents an unsupervised learning approach to solving dimensionality reduction, the main idea of which can be described as:

“Finding low dimensional subspace of the input vector space containing most or all of the input data”

Some already existing solutions are to use direct methods like PCA or iterative methods like Self-organizing maps (SOM) and Growing Cell Structures (GCS). However the problem with these methods is that the dimension size of the output feature map for SOM or GCG (or the number of principle components for PCA) have to be specified ahead of time. There has been another method titled "Neural Gas" (NG) described in 1991, that uses competitive Hebbian learning (CHL) and thus requires some use of a VQ method, to find topological structure that closely reflects how the input data is distributed. This paper proposed a new network model that also uses CHL and is based on NG, but is incremental and only has constant parameters, and thus offers advantages over other models.


2.      CHL/NG

CHL principle:

“for each input signal x connect the two closest centres (measured by Euclidean distance) by an edge”

-         Assume number of centres in R^n iteratively insert topological connections

-         Resulting graph optimally preserves topology in a very general sense (titled induced Delaunay triangulation, limited to only where the input distribution is greater than zero)

-         Only centres at/near data are included in the manifold, rest are dead. For most usefulness should place them only there – use VQ (so we use all centres that we have)

Neural gas is a form of VQ

Principle: “for each input signal x adapt the k nearest centers whereby k is decreasing from a large initial to a small final value” (in time)

Large initial k causes movement towards input of many centres, decreases with time, need to know number of steps in advance

Run: NG and then CHL

Or both concurrently: need to remove obsolete edges somehow

Age gaining scheme. CHL does not influence NG, but NG influences CHL

Effective for topology learning

Problem in practice of original CHL/NG: know the number of centres ahead of time, cost of mistake to redo


3.      Growing Neural Gas Algo


Network: set of nodes A, each node has reference vector in Rn space

N edges unweighted, among pairs of nodes – to define the topological structure

Input: is possible infinite n-dimensional from a distribution (have age parameter)

Main idea: Successively add new units to an initially small network by evaluating local stats of previous step

(same approach as growing cell structure fritzke 1994b, but this has fixed dimensionality)

Here dimensionality depends on input, may vary locally.

Have parameters for how much to move s1 and neighbours; a max max age for edges;

Param lambda to divide algo into sections; alpha constant for initializing new unit error;

Constant d for decreasing error


-         Start with 2 units at random positions in rn space

-         Sample input example

-         Find the nearest 2 units to the input example s1 and s2

-         Increment age of all edges from s1

-         Delta error (s1) = squared error (w s1 - data) (local counter vars)

-         Move s1 and all its neighbours towards input data by param*(diff data - node)

-         If s1 and s2 are connected by edge, set age to 0, or create it (Hebbian in nature – correlation=connection)

-         Remove edges with age larger than a max, if points have no edges delete them

-         Every lambda inputs, add new unit:

Determine unit with max accumulater error q

Insert new unit r halfway between q and its neightbour f (highest error of the neighbours)

W r = 0.5 (wq + wf)

Connect r to q and f, remove original edge q f

New error for q and f is costant alpha times themselves, error r =error q

-         Decrease all error by multiplying with d

-         If not yet done go back to step 1

(why not descrfibe the params needed? )

Removing edges – their end points have moved

Introduce new nodes in very erroneous areas

Simulation Results

4.      Simulation results

- tested on model used for original NG – quickly learns important

(of diff dimens but how???)

600-20000 inputs,  l = 100, e b = 0.2, e n = 0.006; alpha = 0.5; a max = 50, d = 0.995

- 2nd test compares to NG as well, final is similar,100 units , 10k steps

Intermediate are different

Both identify clusters, GNG could continue to grow to discover small clusters (but none are present here)

Testing is pretty sketchy, don’t show the advantages, don’t show that its actually reversible mapping? Don’t show the power of not limiting the number of units before hand (this was identified as a problem in original NG?)

What is the stopping criterion?



Able to work to model

Does not need size

All parametes constant over time

Haven’t really shown the first 2

Note: topology is not extra

Need proper initialization of new units to have constant params and local adaptations

Possible applications:  clustering, VQ

Combine with supervised learning (like has been done with GCS frizke 1994c)

-         Possible to choose arbitrary insertion criterion

Evolving Self-Organizing Maps

On-line pattern analysis by evolving self-organizing maps

(online pattern analysis)


1.      Problem: process online data streams that change often

Input/output distributions are unknown and may change over time

Parameters need to be updated incrementally (sounds more like a solution rather than a problem?)

Limited memory

-         Required efficiency and flexibility – neural network models

Network size in k-means/SOM/NG is fixed

Prototype space of SOM is also fixed

But desire structure and size to evolve with incoming data

Fritke – GCS and growing models

Propose: ESOM – soft (unlike k -means cant get  stuck in local minima) and evolvable network and fast incremental learning

Data clustering and classification – competitive with other approaches


2.      ESOM

Here is the main "build on" of this paper on the first paper

The structure of the ESOM network is as in the GNG (builds on the idea that these kinds of things can be done incrementally)

- null network embedded in input space, new nodes created incrementally, with new input its 2 winners get connected,

- if no nodes match with current input new node is created

learning rule insert here - could be specified to a neural gas or SOM

new node insertion - exactly the poorly matched value, not like in GNG (GNG will take more iterations to tune new node to a new cluster)

doing it this way can introduce noise/artefacts, but can be solved by automatic deletion of obsolete nodes

Aim: life-long learning, strict convergence is not critical

learning rate not drops to zero asympotically

learning rule similar to Heskes and Kappen [12] - optimal exists

usually is 0.05 constant

distance threshold epsilon - trial and error

forgetting constant 0.8 (hebbian learning with forgetting)

no topological order as in SOM, so need connections to define neighbours -> save time in NG, use for vis


1) input new data vector x

2) if there are no matching go to INSERTION

3) MATCHING find winner, if min distance is smaller than threshold go to UPDATING, otherwise to INSERTION

4) INSERTION create new, connect with 2 best matching

5) UPDATING modify winner, neightbours and connections eqs 13 and 14

eq 13 - learning rule of ESOM

eq 14 - connection stength

6) DELETION every Tp steps prune connection with weakest stength

7) repeat from step 1 until no more data

time complexity comparable to GCS/GNG

search is O(N), GCS best, GNG/ESOM 2 best- more efficient than neural gas O(N log N)

in ESOM no track of error and node with largest error


3.      Simulations

  1. colour image quantization
  2. 2 spiral classification
  3. diabetes diagnosis
  4. vowel recognition

Colour image quantization as online data clustering

reduce number of colours in digital colour image, closely related to image compression

Problem: time-consuming

Compare to: 2 basic regular approaches to this: median-cut, octree, k means, wus method [26]

quantize pixel by pixel only k means and ESOM online

3 images chosen 24-bit, true color, smooth colour, details, shades, many colours mixed, images popular in image processing

ESOM best results for the 2 natural images, close to wus for the artificial one

ESOM images have better visual quality

Benchmark tests

ESOM is unsupervised, but can be applicable in supervised setting (just like SOM/other clustering) 1 nn approach

data is coordinates

CMU 2 spiral-data, ESOM needs fewer prototypes for 0 error rate, 1 epoch - single pass through data set (GCG over 100 epochs)

Compared to: GCS, LVQ, SOM

Pima Indian diabetes set UCI rep

data is personal data and if positive or not

10-fold cross validation, data presented 5 times in random order, average performance

epsilon = 0.4

Compared to knn, class/regression trees, MLP, LVQ, LDA - pretty similar

GNG/GCS -only 1 epoch, similar accuracy, GNG needs less nodes but many more epochs

CMU data set

input is speech signals, 4 male, 4 female, 990 frames

off line to set params epsilon = 0.5, learning = 0.05

Compated to perceptrons, nn, GGS. least weights for similar performance, only 1 pass learning

Extend to continue training on-line: reduces number of errors on diabetes/vowel sets


4.      Discussion

vis is not strainght forward as we are in the original input space

compared to regular SOM results are better, less time needed to train, more visual clues as better clustering/proximity

for use in info retrieval do threshold scales - hierarchial feature maps

aggregation to reduce prototype number

self-tuning of system params

apps: multimedia info retrieval, online finincial data analysis



they are pretty explicit about the similiarities/differences with the GNG model/algo, although they use sometimes different terminology/maths

main idea is to use the idea that online learning needs incremental change in the network and the fact that this is shown with NG

so they try a very similar thing with a SOM, to increase its fit to the online learning domain

this isnt a very straight forward/easy extension, as there are many changes and they also talk a lot about NG and CGS and some other models that share some aspects

seems like they are putting together little things from many already explored/iplemented/tried places and apply to a new (sorta, as its similar and NG was inspired by SOM?) setting


benchmarks show about 100 nodes needed (scale of 100)

some results to compare are taken from other papers, not same authors - might be some details that are important missing

ESOM is tested in much more detail than as the GNG presented, with more sets, compared with more algorithms, on more diverse data.

Makes it seem like over time the emperical results have shown to be needed to be properly emphasized and executed.

also maybe more data available and more comparisons/algorithms/uses for such algorithms


Reversible mapping does not exist in general, so some info can be lost

Topology learning: what structures can be mapped reversably?

It would be great to see theoretical proofs of performance and how much topology can be saved for sure in these methods

Annotated Bibliography

Put your annotated bibliography here. Add links where appropriate.

To Add

Put links and content here to be added. This does not need to be organized, and will not be graded as part of the page. If you find something that might be useful for a page, feel free to put it here.

- summary, abstract, related to, builds on

- text of part 1

- text of part 2

- text of part 3

- references

- images?

Some rights reserved
Permission is granted to copy, distribute and/or modify this document according to the terms in Creative Commons License, Attribution-NonCommercial-ShareAlike 3.0. The full text of this license may be found here: CC by-nc-sa 3.0