Focal Loss was introduced by Lin et al

Publicat el 22/06/2022

Sopra this case, the activation function does not depend sopra scores of other classes durante \(C\) more than \(C_1 = C_i\). So the gradient respect onesto the each risultato \(s_i\) durante \(s\) will only depend on the loss given by its binary problem.

Caffe: Sigmoid Cross-Entropy Loss Layer
Pytorch: BCEWithLogitsLoss
TensorFlow: sigmoid_cross_entropy.

Focal Loss

, from Facebook, con this paper. They claim esatto improve one-tirocinio object detectors using Focal Loss onesto train verso detector they name RetinaNet. Focal loss is verso Ciclocampestre-Entropy Loss that weighs the contribution of each sample esatto the loss based sopra the classification error. The preoccupazione is that, if a sample is already classified correctly by the blk CNN, its contribution to the loss decreases. With this strategy, they claim to solve the problem of class imbalance by making the loss implicitly focus durante those problematic classes. Moreover, they also weight the contribution of each class puro the lose per a more explicit class balancing. They use Sigmoid activations, so Focal loss could also be considered a Binary Ciclocampestre-Entropy Loss. We define it for each binary problem as:

Where \((1 – s_i)\gamma\), with the focusing parameter \(\tipo >= 0\), is a modulating factor esatto veterano the influence of correctly classified samples durante the loss. With \(\modo = 0\), Focal Loss is equivalent esatto Binary Cross Entropy Loss.

Where we have separated formulation for when the class \(C_i = C_1\) is positive or negative (and therefore, the class \(C_2\) is positive). As before, we have \(s_2 = 1 – s_1\) and \(t2 = 1 – t_1\).

The gradient gets a bit more complex due preciso the inclusion of the modulating factor \((1 – s_i)\gamma\) mediante the loss formulation, but it can be deduced using the Binary Cross-Entropy gradient expression.

Where \(f()\) is the sigmoid function. Onesto get the gradient expression for a negative \(C_i (t_i = 0\)), we just need puro replace \(f(s_i)\) with \((1 – f(s_i))\) con the expression above.

Topo that, if the modulating factor \(\genere = 0\), the loss is equivalent preciso the CE Loss, and we end up with the same gradient expression.

Forward pass: Loss computation

Where logprobs[r] stores, verso each element of the batch, the sum of the binary ciclocampestre entropy verso each class. The focusing_parameter is \(\gamma\), which by default is 2 and should be defined as a layer parameter durante the net prototxt. The class_balances can be used onesto introduce different loss contributions per class, as they do durante the Facebook paper.

Backward pass: Gradients computation

Per the specific (and usual) case of Multi-Class classification the labels are one-hot, so only the positive class \(C_p\) keeps its term durante the loss. There is only one element of the Target vector \(t\) which is not zero \(t_i = t_p\). So discarding the elements of the summation which are niente paio sicuro target labels, we can write:

This would be the pipeline for each one of the \(C\) clases. We set \(C\) independent binary classification problems \((C’ = 2)\). Then we sum up the loss over the different binary problems: We sum up the gradients of every binary problem preciso backpropagate, and the losses preciso videoclip the global loss. \(s_1\) and \(t_1\) are the punteggio and the gorundtruth label for the class \(C_1\), which is also the class \(C_i\) sopra \(C\). \(s_2 = 1 – s_1\) and \(t_2 = 1 – t_1\) are the risultato and the groundtruth label of the class \(C_2\), which is not a “class” sopra our original problem with \(C\) classes, but a class we create to attrezzi up the binary problem with \(C_1 = C_i\). We can understand it as a preparazione class.

Focal Loss was introduced by Lin et al

Focal Loss

Forward pass: Loss computation

Backward pass: Gradients computation

Deixa un comentari Cancel·la les respostes