Segmentation refers to the division of an image into multiple segments or regions based on certain characteristics, such as color, intensity, texture, or edges. Its purpose is to simplify the representation of images for analysis and understanding by dividing them into reasonable parts.
In image classification, the image as a whole or its regions are assigned to predefined classes based on various specified or learned characteristics. Classification is often part of segmentation, they are not entirely separate concepts. Recognizing what animal is shown in a photo is a classification task.
Clustering is a machine learning technique that finds naturally occurring groups without prior knowledge of classes or categories. No annotated training data is required, the algorithm forms the groups by itself based on the data as a whole. A simple example is the recommendation system used by streaming providers, which is based on user behavior and does not use specific classes, but rather similar patterns.
Binary images can be easily segmented in two steps. In the first step, the algorithm looks at the neighbours of each white pixel from top to bottom, from left to right. If they are all background, a new label is assigned to the current pixel. If a neighbour is not background and is identical, then the current pixel's label gets the label from that neighbour. If there are foreground neighbours with different labels, the lowest label is assigned and then the equivalence is remembered. The second step relabels matching pixels based on the equivalences.
The algorithm grows regions starting from selected points based on the similarity of neighbouring pixels. It is important to properly specify a tolerance, which is the allowed percentage difference between pixels. A list of neighbours to be visited is created, then their neighbours are also added to the list, pixels already visited are removed from the list. The process continues until the list is completely empty.
An iterative clustering method that can be used for image segmentation. The algorithm randomly selects K centroids, then assigns each pixel to the closest cluster based on pixel intensity or other features. Then, the algorithm calculates new centroids based on the average position of the pixels in each cluster. This process is repeated until cluster centers no longer change significantly. Pixels with similar properties are grouped into clusters so that different regions can be separated.
There are several other methods for segmenting images. The Watershed algorithm is used for segmenting complex, touching objects. Unsupervised machine learning methods such as DBSCAN or Mean-Shift do not require a predefined number of clusters. Image segmentation with Convolutional Neural Networks (CNNs) is particularly popular nowadays, as they can recognize learned objects when fed large amounts of data.