Online Pattern Analysis by Evolving Selforganizing Maps
Online pattern analysis by evolving selforganizing maps
One sentence summary
Principal Author: Svetlana Sodol
Collaborators:
Abstract
This should be a brief summary that tells us what is covered in this page.
Builds on
Put links to the more general categories that this builds on. These links should all be in sentences that make sense without following the links. You can only rely on technical knowledge that is in these links (and their transitive closure). There is no need to put the transitive closure of the links.
SOM, neural gas
Related Pages
This should contain the reverse links to the "builds on" links as well as other related pages. Use in sentences.
Content
Growing Neural Gas
Introduction
"A growing neural gas model learns topologies" paper presents an unsupervised learning approach to solving dimensionality reduction, the main idea of which can be described as:
“Finding low dimensional subspace of the input vector space containing most or all of the input data”
Some already existing solutions are to use direct methods like PCA or iterative methods like Selforganizing maps (SOM) and Growing Cell Structures (GCS). However the problem with these methods is that the dimension size of the output feature map for SOM or GCG (or the number of principle components for PCA) have to be specified ahead of time. There has been another method titled "Neural Gas" (NG) described in 1991, that uses competitive Hebbian learning (CHL) and thus requires some use of a VQ method, to find topological structure that closely reflects how the input data is distributed. This paper proposed a new network model that also uses CHL and is based on NG, but is incremental and only has constant parameters, and thus offers advantages over other models.
Background
2. CHL/NG
CHL principle:
“for each input signal x connect the two closest centres (measured by Euclidean distance) by an edge”
 Assume number of centres in R^n iteratively insert topological connections
 Resulting graph optimally preserves topology in a very general sense (titled induced Delaunay triangulation, limited to only where the input distribution is greater than zero)
 Only centres at/near data are included in the manifold, rest are dead. For most usefulness should place them only there – use VQ (so we use all centres that we have)
Neural gas is a form of VQ
Principle: “for each input signal x adapt the k nearest centers whereby k is decreasing from a large initial to a small final value” (in time)
Large initial k causes movement towards input of many centres, decreases with time, need to know number of steps in advance
Run: NG and then CHL
Or both concurrently: need to remove obsolete edges somehow
Age gaining scheme. CHL does not influence NG, but NG influences CHL
Effective for topology learning
Problem in practice of original CHL/NG: know the number of centres ahead of time, cost of mistake to redo
Algorithm
3. Growing Neural Gas Algo
Output:
Network: set of nodes A, each node has reference vector in Rn space
N edges unweighted, among pairs of nodes – to define the topological structure
Input: is possible infinite ndimensional from a distribution (have age parameter)
Main idea: Successively add new units to an initially small network by evaluating local stats of previous step
(same approach as growing cell structure fritzke 1994b, but this has fixed dimensionality)
Here dimensionality depends on input, may vary locally.
Have parameters for how much to move s1 and neighbours; a max max age for edges;
Param lambda to divide algo into sections; alpha constant for initializing new unit error;
Constant d for decreasing error
Algo:
 Start with 2 units at random positions in rn space
 Sample input example
 Find the nearest 2 units to the input example s1 and s2
 Increment age of all edges from s1
 Delta error (s1) = squared error (w s1  data) (local counter vars)
 Move s1 and all its neighbours towards input data by param*(diff data  node)
 If s1 and s2 are connected by edge, set age to 0, or create it (Hebbian in nature – correlation=connection)
 Remove edges with age larger than a max, if points have no edges delete them
 Every lambda inputs, add new unit:
Determine unit with max accumulater error q
Insert new unit r halfway between q and its neightbour f (highest error of the neighbours)
W r = 0.5 (wq + wf)
Connect r to q and f, remove original edge q f
New error for q and f is costant alpha times themselves, error r =error q
 Decrease all error by multiplying with d
 If not yet done go back to step 1
(why not descrfibe the params needed? )
Removing edges – their end points have moved
Introduce new nodes in very erroneous areas
Simulation Results
4. Simulation results
 tested on model used for original NG – quickly learns important
(of diff dimens but how???)
60020000 inputs, l = 100, e b = 0.2, e n = 0.006; alpha = 0.5; a max = 50, d = 0.995
 2^{nd} test compares to NG as well, final is similar,100 units , 10k steps
Intermediate are different
Both identify clusters, GNG could continue to grow to discover small clusters (but none are present here)
Testing is pretty sketchy, don’t show the advantages, don’t show that its actually reversible mapping? Don’t show the power of not limiting the number of units before hand (this was identified as a problem in original NG?)
What is the stopping criterion?
Conclusions
5.
Able to work to model
Does not need size
All parametes constant over time
Haven’t really shown the first 2
Note: topology is not extra
Need proper initialization of new units to have constant params and local adaptations
Possible applications: clustering, VQ
Combine with supervised learning (like has been done with GCS frizke 1994c)
 Possible to choose arbitrary insertion criterion
Evolving SelfOrganizing Maps
Online pattern analysis by evolving selforganizing maps
(online pattern analysis)
Introduction
1. Problem: process online data streams that change often
Input/output distributions are unknown and may change over time
Parameters need to be updated incrementally (sounds more like a solution rather than a problem?)
Limited memory
 Required efficiency and flexibility – neural network models
Network size in kmeans/SOM/NG is fixed
Prototype space of SOM is also fixed
But desire structure and size to evolve with incoming data
Fritke – GCS and growing models
Propose: ESOM – soft (unlike k means cant get stuck in local minima) and evolvable network and fast incremental learning
Data clustering and classification – competitive with other approaches
ESOM
2. ESOM
Here is the main "build on" of this paper on the first paper
The structure of the ESOM network is as in the GNG (builds on the idea that these kinds of things can be done incrementally)
 null network embedded in input space, new nodes created incrementally, with new input its 2 winners get connected,
 if no nodes match with current input new node is created
learning rule insert here  could be specified to a neural gas or SOM
new node insertion  exactly the poorly matched value, not like in GNG (GNG will take more iterations to tune new node to a new cluster)
doing it this way can introduce noise/artefacts, but can be solved by automatic deletion of obsolete nodes
Aim: lifelong learning, strict convergence is not critical
learning rate not drops to zero asympotically
learning rule similar to Heskes and Kappen [12]  optimal exists
usually is 0.05 constant
distance threshold epsilon  trial and error
forgetting constant 0.8 (hebbian learning with forgetting)
no topological order as in SOM, so need connections to define neighbours > save time in NG, use for vis
Algo:
1) input new data vector x
2) if there are no matching go to INSERTION
3) MATCHING find winner, if min distance is smaller than threshold go to UPDATING, otherwise to INSERTION
4) INSERTION create new, connect with 2 best matching
5) UPDATING modify winner, neightbours and connections eqs 13 and 14
eq 13  learning rule of ESOM
eq 14  connection stength
6) DELETION every Tp steps prune connection with weakest stength
7) repeat from step 1 until no more data
time complexity comparable to GCS/GNG
search is O(N), GCS best, GNG/ESOM 2 best more efficient than neural gas O(N log N)
in ESOM no track of error and node with largest error
Simulations
3. Simulations
 colour image quantization
 2 spiral classification
 diabetes diagnosis
 vowel recognition
Colour image quantization as online data clustering
reduce number of colours in digital colour image, closely related to image compression
Problem: timeconsuming
Compare to: 2 basic regular approaches to this: mediancut, octree, k means, wus method [26]
quantize pixel by pixel only k means and ESOM online
3 images chosen 24bit, true color, smooth colour, details, shades, many colours mixed, images popular in image processing
ESOM best results for the 2 natural images, close to wus for the artificial one
ESOM images have better visual quality
Benchmark tests
ESOM is unsupervised, but can be applicable in supervised setting (just like SOM/other clustering) 1 nn approach
data is coordinates
CMU 2 spiraldata, ESOM needs fewer prototypes for 0 error rate, 1 epoch  single pass through data set (GCG over 100 epochs)
Compared to: GCS, LVQ, SOM
Pima Indian diabetes set UCI rep
data is personal data and if positive or not
10fold cross validation, data presented 5 times in random order, average performance
epsilon = 0.4
Compared to knn, class/regression trees, MLP, LVQ, LDA  pretty similar
GNG/GCS only 1 epoch, similar accuracy, GNG needs less nodes but many more epochs
CMU data set
input is speech signals, 4 male, 4 female, 990 frames
off line to set params epsilon = 0.5, learning = 0.05
Compated to perceptrons, nn, GGS. least weights for similar performance, only 1 pass learning
Extend to continue training online: reduces number of errors on diabetes/vowel sets
Conclusions
4. Discussion
vis is not strainght forward as we are in the original input space
compared to regular SOM results are better, less time needed to train, more visual clues as better clustering/proximity
for use in info retrieval do threshold scales  hierarchial feature maps
aggregation to reduce prototype number
selftuning of system params
apps: multimedia info retrieval, online finincial data analysis
Discussion
Contributions
they are pretty explicit about the similiarities/differences with the GNG model/algo, although they use sometimes different terminology/maths
main idea is to use the idea that online learning needs incremental change in the network and the fact that this is shown with NG
so they try a very similar thing with a SOM, to increase its fit to the online learning domain
this isnt a very straight forward/easy extension, as there are many changes and they also talk a lot about NG and CGS and some other models that share some aspects
seems like they are putting together little things from many already explored/iplemented/tried places and apply to a new (sorta, as its similar and NG was inspired by SOM?) setting
Evaluations
benchmarks show about 100 nodes needed (scale of 100)
some results to compare are taken from other papers, not same authors  might be some details that are important missing
ESOM is tested in much more detail than as the GNG presented, with more sets, compared with more algorithms, on more diverse data.
Makes it seem like over time the emperical results have shown to be needed to be properly emphasized and executed.
also maybe more data available and more comparisons/algorithms/uses for such algorithms
Improvements
Reversible mapping does not exist in general, so some info can be lost
Topology learning: what structures can be mapped reversably?
It would be great to see theoretical proofs of performance and how much topology can be saved for sure in these methods
Annotated Bibliography
Put your annotated bibliography here. Add links where appropriate.
To Add
Put links and content here to be added. This does not need to be organized, and will not be graded as part of the page. If you find something that might be useful for a page, feel free to put it here.
 summary, abstract, related to, builds on
 text of part 1
 text of part 2
 text of part 3
 references
 images?
