# Course:CPSC 522/Self-Organizing Maps

## Self-Organizing Maps

The self-organizing map is a dimenionality-reduction approach that stems from the similarity-based structure of the brain.

Principal Author: Svetlana Sodol
Collaborators:

## Abstract

Self-organizing map is an automatic data-analysis method, that allows to obtain an insight into the topographical relationships of the input data.

We introduce the concept of a self-organizing map, its computing and biological backgrounds. We describe the algorithms for implementation including a batch approach. We mention existing open-source implementations and list application areas.

### Builds on

Self-organizing maps are a type of unsupervised artificial neural networks.

### Related Pages

Self-organizing maps are related to vector quantization methods, with some biological connections and opportunity for emergent properties. Can be compared to PCA methods for dimensionality reduction.

## Content

### Background

The formulation of the self-organizing map (SOM) is attributed to Teuvo Kohonen (and SOMs are also referred to as Kohonen maps)[1].

The main idea is to preserve similarity structure (topology) of the high-dimensional input data in the lower-dimensional "map" produced.

#### Biological basis

Kohonen formulates the background for SOMs as grounded in biology. In the brain, specific areas specialize on inputs that are similar. It has been found that neural cells in the brain often form assemblies, called brain maps, in which their topographic location corresponds to some feature value of a specific stimulus in an orderly fashion.[2]. For example, all the visual input is processed separately from auditory input, and inputs from visual cells close together stay close together (retinotopic organization). It has been shown that these maps are not pre-determined genetically but are modified strongly by experiences (Merzenich 1983) and thus are formed instead by learning.

1) this process is general to how the brain functions (generalizable to type of input)

2) what the brain cells get has been pre-processed and therefore is abstract (not linearly related to the actual sensory experience)

Thus, abstract representations can form at any level of abstraction using such a similiarity-based processing model, and we can model brain maps that operate on semantic items (the emergence of operation on meaning that is actually grounded in neurons)

and "how symbolic representation for concepts can be formed automatically", which has been a big idea for cognitive science [3].

This has lead to development of the early artificial feature-sensitive cells through competitive learning neural networks.

#### Computational connection

An optimally tuned feature-sensitive filter by competitive learning has already been demonstrated in abstract form in signal processing. It was the classical vector quantization (VQ) algorithm (alternatively known as the k-means clustering - we represent each data point by its closest centroid or codebook vector). SOM resembles the VQ heavily, but differs from other artificial neural networks approaches in that it does not use error-correction. What distinguishes the SOM from the VQ is that the centroids are also spatially and globally ordered.[2]

### Algorithm

In a 2013 paper, Kohonen summarizes the SOM algorithm as follows:

"Every input data item shall select the model that matches best with

the input item, and this model, as well as a subset of its spatial

neighbors in the grid, shall be modified for better matching."[2]

Similar to VQ (and k-means implementations) the modifications are centred on the winning model and the local changes are propagated over time.

Visualization of how SOM updates its model nodes to fit the topology of the input data

There are two versions for SOM algorithm:

1) Incremental: recursive stepwise approximation where the inputs are fed to the network one at a time.

2) Batch: all data is applied as a batch where all model nodes are updated concurrently.

In the image on the right you can see a representation of what the SOM model accomplishes on the input data. This image is taken from the original Wikipedia page on SOMs and was made by user Chompinha.

#### SOM Structure

SOM is usually structured as a 2D regular grid of nodes. Every node is a representation (model denoted by ${\displaystyle m_{i}\in \Re ^{n}}$) of the input observations. Random or pre-organized initializations for the model values work, dependent on the task.

The inputs are denoted as ${\displaystyle x(t)\in \Re ^{n}}$for iteration ${\displaystyle t}$.

The grid is automatically organized such that similar models are placed closer to each other than to dissimilar ones, thus being "both a similiarity graph and a clustering diagram".[3]

#### Incremental algorithm

After initialization, the incremental algorithm has two main steps that are repeated for many iterations until convergence:

For each input :

1) Get the "winner" model for the input

2) Update the models

###### Finding the "winner"

The "winner" model ${\displaystyle m_{c}}$is defined by the following condition:

${\displaystyle \forall i,\lVert x(t)-m_{c}(t)\rVert \leq \lVert x(t)-m_{i}(t)\rVert }$

where ${\displaystyle m_{i}}$ is the current model value at node ${\displaystyle i}$

${\displaystyle x(t)}$ is the value of the current observation

Thus the "winner" model found in step 1 is the model that matches best (is closest) with the current input. The distance between the models and the inputs is usually Euclidean.

###### Updating the models

For step 2 the following is the update function - at iteration ${\displaystyle t}$ calculate the value for each model ${\displaystyle i}$ at iteration ${\displaystyle t+1}$:

${\displaystyle m_{i}(t+1)=m_{i}(t)+h_{c(x),i}(x(t)-m_{i}(t))}$

${\displaystyle h_{c(x),i}}$ is the neighbourhood function value for nodes ${\displaystyle i}$ and ${\displaystyle c(x)}$

${\displaystyle c(x)=c}$is the "winner" model's node for the current iteration from step 1.

###### Neighbourhood function

The neighbourhood function can be described simply as ${\textstyle h_{c(x),i}=\alpha (t)}$if ${\textstyle \lVert r_{i}-r_{c}\rVert }$is smaller than given radius from node ${\displaystyle c}$and otherwise ${\textstyle h_{c(x),i}=0}$.

Here, ${\textstyle 0\leq \alpha (t)\leq 1}$is the learning rate, that decreses with the number of iterations,

${\textstyle r_{i},r_{c}\in \Re ^{2}}$(for the 2D case) are the grid locations for the model node ${\displaystyle i}$being updated and the winner model node ${\displaystyle c}$

The radius of the nodes that need to be updated also decreases with the number of iterations. From this definition we can form the set of nodes that are within the radius and refer to it as the neighbourhood set ${\displaystyle N_{c}}$.

The definition of the neighbourhood function is often a Gaussian:

${\displaystyle h_{c(x),i}=\alpha (t)exp{\Bigl (}{\frac {\lVert r_{i}-r_{c}\rVert ^{2}}{2\sigma ^{2}(t)}}{\Bigr )}}$
where ${\displaystyle \sigma (t)}$is the width that decreases with the number of iterations (controls the radius of the neighbourhood set).

#### Batch algorithm

The batch implementation does not require specification of learning factor ${\displaystyle \alpha (t)}$ and is often faster in practice.

The batch algorithm is an iterative algorithm, where the update is performed on all of the data at the same time and might require many run throughs, until we reach convergence at iteration ${\displaystyle t_{f}}$ and the final models ${\textstyle m_{i}^{*}}$.

This is the update function[4]:

${\displaystyle m_{i}^{*}(t_{f})={\frac {\sum _{t=t_{0}}^{t_{f}}h_{c(x),i}x(t)}{\sum _{t=t_{0}}^{t_{f}}h_{c(x),i}}}}$

#### Convergence condition

For both algorithms, we run the update for many iterations, until we reach a stable solution - the models converge i.e. are not changed much between iterations.

In mathematical terms we want to reach the following stationary state condition: ${\textstyle \forall i,\mathrm {E} _{t}\{{h_{c(x),i}(x-m_{i}^{*})}\}=0}$

#### Implementations

Kohonen and his group have developed and released implementations and freeware software packages: SOM-PAC and SOM Toolbox. There is also an available parallel implementation Somoclu.[4]

### Applications

#### Visualization

SOMs are very useful for visualization, as the spatial order of the map provides a convenient and quick visual inspection of the similarity relationships of the input data and their clustering tendency. This can also be used to validate samples. And with proper calibration, the clusterings/classes become explicit.[2] See the figure to the right for an example. This image is taken from https://peltarion.com/platform and featured on the regular Wikipedia page for SOMs.

An example of result of using SOM to cluster data

#### Management of large document collections

However, SOMs have mostly been used in management of large document collections.

In 2000, Kohonen et al used SOM to manage an electronic version of Encyclopaedia Britiannica.

A total of 115 000 documents were used, with vocabulary size of 39 058 words. Inverse document frequency was used to weight the words to represent the documents in an abstract way. The SOM grid contained 12 096 nodes. The end result is a map, where each document is mapped onto its best matching node. In a browsing mode, the user can click on a particular node and see a list of all of the documents associated with that node. For example, clicking on a node titled "shark" one would see links to articles titled "fox shark", "blacktip shark" and "shark: Description and habits ". Other modes of usage can also be implemented.

#### Other applications

Some other application areas include:

• Industrial process control
• Finance analyses
• Bioinformatics and biomedical analysis
• Profiling criminal behavior
• Categorization of galaxies
• Clustering and classification of visual data [5]
• Classification of diabetes [6]
• Vowel recognition [6]

#### Benefits

• SOM can handle large volumes of data
• SOM can work on large mappings in reasonable time even on a personal computer
• Provides good visualizations, which are a hard problem for multi-dimensional data
• Flexible in applications - can use SOM to cluster, classify, identify outliers, etc
• Can use the stepwise algorithm to add new examples online
• Based on biology

#### Pitfalls

• SOM does not seem to be used as a stand alone - as with other learning methods there is a lot of pre-processing
• Similiarity (the main measure of closeness for SOM) is hard to define
• SOM does not make a generative model of the data - focuses on modelling the trends it sees in the data without assumptions of where the data is coming from (in the simple original SOM case)
• Biological basis is often ignored in implementations/extensions/applications

## Annotated Bibliography

1. Kohonen, Teuvo (1982). "Self-organized formation of topologically correct feature maps". Biological Cybernetics. 43: 59–69.
2. Kohonen, Teuvo (January 2013). "Essentials of the self-organizing map". Neural Networks. 37: 52–65.
3. Kohonen, Teuvo (May 1998). "The self-organizing map". Neurocomputing. 21 (1–3): 1–6.
4. Wittek, Peter; Gao, Shi Chao (June 2017). "Somoclu: An Efficient Parallel Library for Self-Organizing Maps" (PDF). Journal of Statistical Software. 78 (9).
5. Furao, Shen; Ogura, Tomotaka (2007). "An enhanced self-organizing incremental neural network for online unsupervised learning". Neural Networks. 20: 893–903.
6. Deng, Da; Kasabov, Nikola (2003). "On-line pattern analysis by evolving self-organizing maps". Neurocomputing. 51: 87–103.

is the following needed?

- code/pseudocode

- example

- annotations

For some other pages:

- history of formation of the SOM algorithms and basis for the decisions

- initialization of parameters

- theoretical implications

- applications and comparison to other approaches

- impact on brain modelling progress?

- big problem in AI/roboitcs - grounding of meaning - related?

 Permission is granted to copy, distribute and/or modify this document according to the terms in Creative Commons License, Attribution-NonCommercial-ShareAlike 3.0. The full text of this license may be found here: CC by-nc-sa 3.0