Course:CPSC 522/Self-Organizing Maps
- 1 Self-Organizing Maps
- 2 Abstract
- 3 Content
- 3.1 Background
- 3.2 Algorithm
- 3.3 Applications
- 4 Annotated Bibliography
- 5 To Add
The self-organizing map is a dimenionality-reduction approach that stems from the similarity-based structure of the brain.
Principal Author: Svetlana Sodol
Self-organizing map is an automatic data-analysis method, that allows to obtain an insight into the topographical relationships of the input data.
We introduce the concept of a self-organizing map, its computing and biological backgrounds. We describe the algorithms for implementation including a batch approach. We mention existing open-source implementations and list application areas.
The formulation of the self-organizing map (SOM) is attributed to Teuvo Kohonen (and SOMs are also referred to as Kohonen maps).
The main idea is to preserve similarity structure (topology) of the high-dimensional input data in the lower-dimensional "map" produced.
Kohonen formulates the background for SOMs as grounded in biology. In the brain, specific areas specialize on inputs that are similar. It has been found that neural cells in the brain often form assemblies, called brain maps, in which their topographic location corresponds to some feature value of a specific stimulus in an orderly fashion.. For example, all the visual input is processed separately from auditory input, and inputs from visual cells close together stay close together (retinotopic organization). It has been shown that these maps are not pre-determined genetically but are modified strongly by experiences (Merzenich 1983) and thus are formed instead by learning.
Brain maps lead to some further important realizations about the brain:
1) this process is general to how the brain functions (generalizable to type of input)
2) what the brain cells get has been pre-processed and therefore is abstract (not linearly related to the actual sensory experience)
Thus, abstract representations can form at any level of abstraction using such a similiarity-based processing model, and we can model brain maps that operate on semantic items (the emergence of operation on meaning that is actually grounded in neurons)
and "how symbolic representation for concepts can be formed automatically", which has been a big idea for cognitive science .
This has lead to development of the early artificial feature-sensitive cells through competitive learning neural networks.
An optimally tuned feature-sensitive filter by competitive learning has already been demonstrated in abstract form in signal processing. It was the classical vector quantization (VQ) algorithm (alternatively known as the k-means clustering - we represent each data point by its closest centroid or codebook vector). SOM resembles the VQ heavily, but differs from other artificial neural networks approaches in that it does not use error-correction. What distinguishes the SOM from the VQ is that the centroids are also spatially and globally ordered.
In a 2013 paper, Kohonen summarizes the SOM algorithm as follows:
"Every input data item shall select the model that matches best with
the input item, and this model, as well as a subset of its spatial
neighbors in the grid, shall be modified for better matching."
Similar to VQ (and k-means implementations) the modifications are centred on the winning model and the local changes are propagated over time.
There are two versions for SOM algorithm:
1) Incremental: recursive stepwise approximation where the inputs are fed to the network one at a time.
2) Batch: all data is applied as a batch where all model nodes are updated concurrently.
In the image on the right you can see a representation of what the SOM model accomplishes on the input data. This image is taken from the original Wikipedia page on SOMs and was made by user Chompinha.
SOM is usually structured as a 2D regular grid of nodes. Every node is a representation (model denoted by ) of the input observations. Random or pre-organized initializations for the model values work, dependent on the task.
The inputs are denoted as for iteration .
The grid is automatically organized such that similar models are placed closer to each other than to dissimilar ones, thus being "both a similiarity graph and a clustering diagram".
After initialization, the incremental algorithm has two main steps that are repeated for many iterations until convergence:
For each input :
1) Get the "winner" model for the input
2) Update the models
Finding the "winner"
The "winner" model is defined by the following condition:
where is the current model value at node
is the value of the current observation
Thus the "winner" model found in step 1 is the model that matches best (is closest) with the current input. The distance between the models and the inputs is usually Euclidean.
Updating the models
For step 2 the following is the update function - at iteration calculate the value for each model at iteration :
is the neighbourhood function value for nodes and
is the "winner" model's node for the current iteration from step 1.
The neighbourhood function can be described simply as if is smaller than given radius from node and otherwise .
Here, is the learning rate, that decreses with the number of iterations,
(for the 2D case) are the grid locations for the model node being updated and the winner model node
The radius of the nodes that need to be updated also decreases with the number of iterations. From this definition we can form the set of nodes that are within the radius and refer to it as the neighbourhood set .
The definition of the neighbourhood function is often a Gaussian:
The batch implementation does not require specification of learning factor and is often faster in practice.
The batch algorithm is an iterative algorithm, where the update is performed on all of the data at the same time and might require many run throughs, until we reach convergence at iteration and the final models .
This is the update function:
For both algorithms, we run the update for many iterations, until we reach a stable solution - the models converge i.e. are not changed much between iterations.
In mathematical terms we want to reach the following stationary state condition:
SOMs are very useful for visualization, as the spatial order of the map provides a convenient and quick visual inspection of the similarity relationships of the input data and their clustering tendency. This can also be used to validate samples. And with proper calibration, the clusterings/classes become explicit. See the figure to the right for an example. This image is taken from https://peltarion.com/platform and featured on the regular Wikipedia page for SOMs.
Management of large document collections
However, SOMs have mostly been used in management of large document collections.
In 2000, Kohonen et al used SOM to manage an electronic version of Encyclopaedia Britiannica.
A total of 115 000 documents were used, with vocabulary size of 39 058 words. Inverse document frequency was used to weight the words to represent the documents in an abstract way. The SOM grid contained 12 096 nodes. The end result is a map, where each document is mapped onto its best matching node. In a browsing mode, the user can click on a particular node and see a list of all of the documents associated with that node. For example, clicking on a node titled "shark" one would see links to articles titled "fox shark", "blacktip shark" and "shark: Description and habits ". Other modes of usage can also be implemented.
Some other application areas include:
- Industrial process control
- Finance analyses
- Bioinformatics and biomedical analysis
- Profiling criminal behavior
- Categorization of galaxies
- Clustering and classification of visual data 
- Classification of diabetes 
- Vowel recognition 
- SOM can handle large volumes of data
- SOM can work on large mappings in reasonable time even on a personal computer
- Provides good visualizations, which are a hard problem for multi-dimensional data
- Flexible in applications - can use SOM to cluster, classify, identify outliers, etc
- Can use the stepwise algorithm to add new examples online
- Based on biology
- SOM does not seem to be used as a stand alone - as with other learning methods there is a lot of pre-processing
- Similiarity (the main measure of closeness for SOM) is hard to define
- SOM does not make a generative model of the data - focuses on modelling the trends it sees in the data without assumptions of where the data is coming from (in the simple original SOM case)
- Biological basis is often ignored in implementations/extensions/applications
- Kohonen, Teuvo (1982). "Self-organized formation of topologically correct feature maps". Biological Cybernetics. 43: 59–69.
- Kohonen, Teuvo (January 2013). "Essentials of the self-organizing map". Neural Networks. 37: 52–65.
- Kohonen, Teuvo (May 1998). "The self-organizing map". Neurocomputing. 21 (1-3): 1–6.
- Wittek, Peter; Gao, Shi Chao (June 2017). "Somoclu: An Efficient Parallel Library for Self-Organizing Maps" (PDF). Journal of Statistical Software. 78 (9).
- Furao, Shen; Ogura, Tomotaka (2007). "An enhanced self-organizing incremental neural network for online unsupervised learning". Neural Networks. 20: 893–903.
- Deng, Da; Kasabov, Nikola (2003). "On-line pattern analysis by evolving self-organizing maps". Neurocomputing. 51: 87–103.
is the following needed?
For some other pages:
- history of formation of the SOM algorithms and basis for the decisions
- initialization of parameters
- theoretical implications
- applications and comparison to other approaches
- impact on brain modelling progress?
- big problem in AI/roboitcs - grounding of meaning - related?