top of page


Author : Aprajita Khurana

Self Organising Maps (also known as Kohonen’s maps) are a type of artificial neural networks introduced by a Finnish professor Teuvo Kohonen in 1980s. It is a very useful technique for clustering analysis, and exploring data.










Self-organizing maps differ from other artificial neural networks as they apply competitive learning as opposed to error-correction learning (such as backpropagation with gradient descent or SGD), and in the sense that they use a neighborhood function to preserve the topological properties of the input space.

It is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional), discretized representation of the input space of the training samples, called a map, and is therefore a method to do dimensionality reduction.

It also helps us to discover the correlation between data.



It is a two layer architecture. The layers being as follows:

  • Input Layer.

  • Output layer or Feature Map.

SOM don’t have activation function in neurons and directly pass weights to output layer and each neuron in an SOM is assigned a weight vector with the same dimensions as the input space.


How this works?

Each data point in the data set recognizes themselves by competing for representation. SOM mapping steps starts from initializing the weight vectors. From there a sample vector is selected randomly and the map of weight vectors is searched to find which weight best represents that sample. Each weight vector has neighbouring weights that are close to it. The weight that is chosen is rewarded by being able to become more like that randomly selected sample vector. The neighbours of that weight are also rewarded by being able to become more like the chosen sample vector. This allows the map to grow and form different shapes. Most generally, they form square/rectangular/hexagonal/L shapes in 2D feature space.



As mentioned before, SOM uses competitive learning to initialize its weights and this type of learning can ne divided into 3 sub-processes:

  • Competition.

  • Cooperation.

  • Adaptation.


Distance between each neuron (from output layer) and the input data is computed. The neuron with the lowest distance is declared the winner.

Metric used: Eucledian.


The winning neuron determines the spatial location of a topological neighborhood of excited neurons, thereby providing the basis for cooperation among neighboring neurons. The winner neuron’s neighbor must be updated using the neighborhood kernel function.

The neighbourhood kernel function depends on 2 factors:

  • Time: incremented with each new input data.

  • Distance: between the winner neuron and the other neuron.



The excited neurons decrease their individual values of the discriminant function in relation to the input pattern through suitable adjustment of the associated connection weights, such that the response of the winning neuron to the subsequent application of a similar input pattern is enhanced.

Formula used:

We have a time (epoch) t dependent learning rate

, and the updates are applied for all the training patterns x over many epochs. The effect of each learning weight update is to move the weight vectors wi of the winning neuron and its neighbours towards the input vector x. Repeated presentations of the training data thus leads to topological ordering.





  • Each node’s weights are initialized.

  • A vector is chosen at random from the set of training data.

  • Every node is examined to calculate which one’s weights are most like the input vector. The winning node is commonly known as the Best Matching Unit (BMU).

  • Then the neighbourhood of the BMU is calculated. The amount of neighbors decreases over time.

  • The winning weight is rewarded with becoming more like the sample vector. The nighbors also become more like the sample vector. The closer a node is to the BMU, the more its weights get altered and the farther away the neighbor is from the BMU, the less it learns.

  • Repeat step 2 for N iterations.

Best Matching Unit is a technique which calculates the distance from each weight to the sample vector, by running through all weight vectors. The weight with the shortest distance is the winner. (as discussed in competitive process).


An illustration of the training of a self-organizing map. The blue blob is the distribution of the training data, and the small white disc is the current training datum drawn from that distribution. At first (left) the SOM nodes are arbitrarily positioned in the data space. The node (highlighted in yellow) which is nearest to the training datum is selected. It is moved towards the training datum, as (to a lesser extent) are its neighbours on the grid. After many iterations the grid tends to approximate the data distribution (right).


Many python libraries like minisom, sompy can be used for implementation of the Self Organising Maps.




2. On the iris dataset:



If the average distance is high, then the surrounding weights are very different and a light color is assigned to the location of the weight. If the average distance is low, a darker color is assigned. The resulting maps show that the concentration of different clusters of species are more predominant in three zones. First figure tells us only about where the density of species is greater (darker regions) or less (lighter regions). The second visualisation tells us how they are specifically clustered.


Self-organizing maps, like most artificial neural networks, operate in two modes: training and mapping. First, training uses an input data set (the "input space") to generate a lower-dimensional representation of the input data (the "map space"). Second, mapping classifies additional input data using the generated map. In most cases, the goal of training is to represent an input space with p dimensions as a map space with two dimensions. Specifically, an input space with p variables is said to have p dimensions. A map space consists of components called "nodes" or "neurons," which are arranged as a hexagonal or rectangular grid with two dimensions. The number of nodes and their arrangement are specified beforehand based on the larger goals of the analysis and exploration of the data.

Each node in the map space is associated with a "weight" vector, which is the position of the node in the input space. While nodes in the map space stay fixed, training consists in moving weight vectors toward the input data (reducing a distance metric such as Euclidean distance) without spoiling the topology induced from the map space. After training, the map can be used to classify additional observations for the input space by finding the node with the closest weight vector (smallest distance metric) to the input space vector



75 views0 comments

Recent Posts

See All
bottom of page