News

Focal Loss vs Binary Cross-Entropy: A Practical Guide for Imbalanced Classification

November 18, 2025

Understanding the Limitations of Binary Cross-Entropy and the Advantages of Focal Loss in Imbalanced Classification

Binary cross-entropy (BCE) is widely used as the standard loss function for binary classification tasks. However, its effectiveness diminishes significantly when applied to datasets with severe class imbalance. The core issue lies in the fact that BCE treats errors from both classes with equal importance, regardless of how infrequent one class might be.

Why Binary Cross-Entropy Struggles with Imbalanced Data

Consider two prediction scenarios: one where a rare positive instance (minority class) with a true label of 1 is predicted with a probability of 0.3, and another where a common negative instance (majority class) with a true label of 0 is predicted at 0.7. Both cases yield the same BCE loss value of -log(0.3). But should these errors be penalized equally? In datasets where one class dominates, misclassifying the minority class is far more detrimental, yet BCE does not differentiate between these mistakes.

Introducing Focal Loss: A Solution for Imbalanced Classification

Focal Loss addresses this imbalance by diminishing the influence of well-classified, easy examples and emphasizing the harder, often minority-class samples. This mechanism enables the model to concentrate on learning the subtle patterns of the underrepresented class rather than being overwhelmed by the majority class. This approach has gained traction in fields like medical imaging and fraud detection, where minority classes are critical yet scarce.

Setting Up the Experiment: Generating an Imbalanced Dataset

To illustrate the difference between BCE and Focal Loss, we generate a synthetic binary classification dataset with a pronounced 99:1 class imbalance using 6,000 samples. This setup mimics real-world scenarios such as rare disease detection, where the positive cases are extremely limited compared to negatives.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import makeclassification
from sklearn.modelselection import traintestsplit
import torch
import torch.nn as nn
import torch.optim as optim

Create a highly imbalanced dataset
X, y = makeclassification(
    nsamples=6000,
    nfeatures=2,
    nredundant=0,
    nclustersperclass=1,
    weights=[0.99, 0.01],
    classsep=1.5,
    randomstate=42
)

Xtrain, Xtest, ytrain, ytest = traintestsplit(
    X, y, testsize=0.3, randomstate=42
)

Xtrain = torch.tensor(Xtrain, dtype=torch.float32)
ytrain = torch.tensor(ytrain, dtype=torch.float32).unsqueeze(1)
Xtest = torch.tensor(Xtest, dtype=torch.float32)
ytest = torch.tensor(ytest, dtype=torch.float32).unsqueeze(1)

Designing a Simple Neural Network Architecture

We implement a straightforward neural network with two hidden layers to maintain focus on the impact of the loss functions rather than model complexity. This architecture is sufficient to capture the decision boundary in our two-dimensional feature space and clearly demonstrate the contrasting behaviors of BCE and Focal Loss.

class SimpleNN(nn.Module):
    def init(self):
        super().init()
        self.network = nn.Sequential(
            nn.Linear(2, 16),
            nn.ReLU(),
            nn.Linear(16, 8),
            nn.ReLU(),
            nn.Linear(8, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.network(x)

Implementing Focal Loss for Enhanced Minority Class Learning

The Focal Loss function modifies the traditional BCE by applying a modulating factor that down-weights easy examples and focuses training on difficult, misclassified samples. The parameter gamma controls the rate at which easy examples are suppressed, while alpha balances the importance of the minority class. This tailored loss function is particularly effective in domains like anomaly detection, where rare events must be identified accurately.

class FocalLoss(nn.Module):
    def init(self, alpha=0.25, gamma=2):
        super().init()
        self.alpha = alpha
        self.gamma = gamma

    def forward(self, preds, targets):
        epsilon = 1e-7
        preds = torch.clamp(preds, epsilon, 1 - epsilon)
        pt = torch.where(targets == 1, preds, 1 - preds)
        loss = -self.alpha  (1 - pt)  self.gamma  torch.log(pt)
        return loss.mean()

Training and Evaluating Models with BCE and Focal Loss

We train two identical neural networks: one optimized with standard BCE loss and the other with Focal Loss. Both models are trained for 30 epochs using the Adam optimizer. Despite BCE achieving a high overall accuracy (~98%), this metric is misleading due to the overwhelming majority class. Focal Loss, however, improves detection of the minority class, resulting in a more meaningful accuracy (~99%) that reflects better performance on rare samples.

def trainmodel(model, lossfunction, learningrate=0.01, epochs=30):
    optimizer = optim.Adam(model.parameters(), lr=learningrate)
    for  in range(epochs):
        predictions = model(Xtrain)
        loss = lossfunction(predictions, ytrain)
        optimizer.zerograd()
        loss.backward()
        optimizer.step()

    with torch.nograd():
        testpredictions = model(Xtest)
        accuracy = ((testpredictions > 0.5).float() == ytest).float().mean().item()
    return accuracy, testpredictions.squeeze().detach().numpy()

Initialize models
modelbce = SimpleNN()
modelfocal = SimpleNN()

Train models
accuracybce, predsbce = trainmodel(modelbce, nn.BCELoss())
accuracyfocal, predsfocal = trainmodel(modelfocal, FocalLoss(alpha=0.25, gamma=2))

print(f"Test Accuracy with BCE: {accuracybce:.4f}")
print(f"Test Accuracy with Focal Loss: {accuracyfocal:.4f}")

Visualizing Decision Boundaries: BCE vs. Focal Loss

The decision boundary learned by the BCE model tends to be nearly flat, predominantly predicting the majority class and neglecting minority instances. This occurs because BCE is heavily influenced by the abundant majority samples. Conversely, the Focal Loss model delineates a more nuanced boundary, effectively capturing minority class regions and demonstrating its superior ability to learn from imbalanced data.

def plotdecisionboundary(model, title):
    xmin, xmax = X[:, 0].min() - 1, X[:, 0].max() + 1
    ymin, ymax = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(
        np.linspace(xmin, xmax, 300),
        np.linspace(ymin, ymax, 300)
    )
    gridpoints = torch.tensor(np.c[xx.ravel(), yy.ravel()], dtype=torch.float32)
    with torch.nograd():
        Z = model(gridpoints).reshape(xx.shape)

    plt.contourf(xx, yy, Z, levels=[0, 0.5, 1], alpha=0.4, cmap='coolwarm')
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', s=10, edgecolors='k')
    plt.title(title)
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.show()

plotdecisionboundary(modelbce, "Decision Boundary with Binary Cross-Entropy")
plotdecisionboundary(modelfocal, "Decision Boundary with Focal Loss")

Confusion Matrix Analysis: Highlighting Minority Class Recognition

Examining the confusion matrices reveals stark differences: the BCE-trained model correctly identifies only a single minority-class instance while misclassifying 27. This reflects its bias toward the majority class. In contrast, the Focal Loss model improves minority class recognition by correctly classifying 14 instances and reducing misclassifications to 14, showcasing its effectiveness in emphasizing challenging samples.

from sklearn.metrics import confusionmatrix, ConfusionMatrixDisplay

def displayconfusionmatrix(truelabels, predictedlabels, title):
    cm = confusionmatrix(truelabels, predictedlabels)
    disp = ConfusionMatrixDisplay(confusionmatrix=cm)
    disp.plot(cmap='Blues', valuesformat='d')
    plt.title(title)
    plt.show()

ytestnp = ytest.numpy().astype(int)
predsbcelabels = (predsbce > 0.5).astype(int)
predsfocallabels = (predsfocal > 0.5).astype(int)

displayconfusionmatrix(ytestnp, predsbcelabels, "Confusion Matrix - BCE Loss")
displayconfusionmatrix(ytestnp, predsfocal_labels, "Confusion Matrix - Focal Loss")

By focusing training on difficult, minority-class examples, Focal Loss offers a robust alternative to binary cross-entropy for imbalanced classification problems. This approach is increasingly vital in applications such as fraud detection, rare event prediction, and medical diagnostics, where identifying the minority class accurately is crucial.

Loading…

Here are the results for the search: "{{td_search_query}}"

No results!

{{post_title}}

Understanding the Limitations of Binary Cross-Entropy and the Advantages of Focal Loss in Imbalanced Classification

Why Binary Cross-Entropy Struggles with Imbalanced Data

Introducing Focal Loss: A Solution for Imbalanced Classification

Setting Up the Experiment: Generating an Imbalanced Dataset

Create a highly imbalanced dataset

Designing a Simple Neural Network Architecture

Implementing Focal Loss for Enhanced Minority Class Learning

Training and Evaluating Models with BCE and Focal Loss

Initialize models

Train models

Visualizing Decision Boundaries: BCE vs. Focal Loss

Confusion Matrix Analysis: Highlighting Minority Class Recognition

RELATED ARTICLES

The AI lab revolving door spins ever faster

A Coding Guide to Build a Procedural Memory Agent That Learns,...

Mistral AI Ships Devstral 2 Coding Models And Mistral Vibe CLI...