Classification I. - Image processing

Binary classification

If elements of two classes need to be distinguished, the task is called binary classification or may be called logistic regression. The output is a number between 0 and 1, which the closer to the two extremes, the more confident the model is in its decision. If the output is very close to 0.5, the model is not confident.

CNN: Convolution

Convolutional neural networks, as their name suggests, contain convolutional layers with a given size and number of kernels per layer. In the case of images, IMU sensor output, and other shape-based data, these networks are capable of automatically recognizing local patterns and features such as edges, textures, and temporal or spatial patterns. Through multiple convolutional layers built on top of each other, the network learns increasingly complex patterns, making it suitable for image processing, motion detection, and countless other similar tasks.

CNN: Max pooling

Convolutional layers are often followed by max pooling layers. A 2x2 max pooling reduces both the width and height of an image by half by splitting the image into 2x2 areas and selecting the maximum of the given 4 elements in each area. The result is less data, but the characteristics of features are preserved.

Batch size

Most of the time it is not possible to feed the entire training dataset to the algorithm in one go, due to the large size and memory limitations of computers. The solution is to divide the data into small groups, called batches. The batch size parameter is the number of samples in each of these small groups.

Epoch

Training is an iterative process. In an epoch, the number of iterations is equal to the number of batches that add up the entire dataset. The entire dataset flows through the network in each epoch, thereby tuning the weights, biases and convolution kernels. Ideally, the accuracy increases with each epoch.

Confusion matrix

The performance of the test dataset can be easily displayed on a matrix, where each class is placed on both the horizontal and vertical axes. The rows are the correct classes, the columns the classes predicted by the model, and the elements of the matrix are the number of combinations.

Learning curves

The learning curves show the development of model performance during training. They plot accuracy and loss as a function of iterations. The curves drawn for the training and validation datasets can be used to recognize underfitting and overfitting. It can be decided whether more data, regularization, or the early stopping of training is needed, and whether it is worth experimenting with a deeper or simpler structure.