Classification II. - Image processing

Probability distribution

If the model chooses from more than two classes, it assigns a probability to each class, rather than a single value between 0 and 1. These probabilities form a probability distribution. The model’s final decision is the class with the highest probability, but the program itself may not make a decision if the probability falls below a certain threshold.

Data augmentation

The goal of augmentation is to artificially expand the training dataset using various transformations, which in image processing include rotation, mirroring, scaling, translation, or the addition of noise. The modified images still belong to the same class, but they help ensure that the model does not learn the exact patterns of the training data. Augmentation reduces the risk of overfitting when working with small datasets.

Batch normalization

By standardizing the effects (activations) of a layer at the batch level (mean 0, standard deviation 1), training becomes faster and more stable, especially when using deeper networks. Batch Normalization first calculates the mean and variance of the current batch, and uses these values to perform standardization. The instability during training is caused by the varying scale and offset of input data, this method corrects for that. Also, BN introduces two trainable parameters after the activations, that allow the network to restore or adjust the optimal distribution of the activations.

Mean and variance of batches:

$ \bar{x}_b = \dfrac{1}{n_b} ⋅ \displaystyle\sum_{i=1}^{n_b} x_i $

$ \sigma_b^2 = \dfrac{1}{n_b} ⋅ \displaystyle\sum_{i=1}^{n_b} (x_i - \bar{x}_b)^2 $

Standardization:

$ x_i' = \dfrac{x_i - \bar{x}_b}{\sqrt{\sigma_b^2 + \epsilon}} $

New trainable parameters:

$ y_i = \alpha ⋅ x_i' + \beta $

Evaluation

Accuracy:

The accuracy of a model is the ratio of correctly predicted values to the total number of predictions. In case of binary classification, the accuracy is defined as the ratio of samples classified as true positive (TP) and true negative (TN) to the total number of samples. This metric can also be calculated for multiclass classifiers using a similar calculation.

$ \dfrac{TP + TN}{TP + FP + TN + FN} $

Precision:

Accuracy alone often does not indicate how well a model is performing. It might be important to measure the ratio of correctly predicted positive samples to all samples classified as positive. If there are multiple classes, precision is the ratio of correctly classified samples and all classified samples for each class separately.

$ \dfrac{TP}{TP + FP} $

Recall:

A metric similar to precision is recall, which shows in what ratio true positive samples were identified by the model. In case of multiple classes, it indicates the proportion of samples from a given class that the model correctly classified. These two metrics help to understand a model’s behavior, as it might demonstrate high accuracy, but miss many true positives (low recall) or produce many false positives (low precision).

$ \dfrac{TP}{TP + FN} $

Additional metrics:

F1-score, Top-5 Accuracy, Average Precision, mAP, IoU