PyTorch NN not training

Issue

I have a bespoke NN model which works and wanted to move it to the PyTorch framework. However, the network is not training likely due to some misconfiguration. Please advise if you see something that is odd/wrong or could be a contributing reason.

import torch
from torch import nn, optim
import torch.nn.functional as F
X_train_t = torch.tensor(X_train).float()
X_test_t = torch.tensor(X_test).float()
y_train_t = torch.tensor(y_train).long().reshape(y_train_t.shape[0], 1)
y_test_t = torch.tensor(y_test).long().reshape(y_test_t.shape[0], 1)

class Classifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(22, 10)
        self.fc2 = nn.Linear(10, 1)
        
    def forward(self, x):
        # make sure input tensor is flattened
        x = x.view(x.shape[0], -1)
        
        x = F.relu(self.fc1(x))
        x = F.log_softmax(self.fc2(x), dim=1)
        
        return x

model = Classifier()
criterion = nn.BCELoss()
optimizer = optim.SGD(model.parameters(), lr=0.003)

epochs = 2000
steps = 0

train_losses, test_losses = [], []
for e in range(epochs):
    # training loss
    optimizer.zero_grad()

    log_ps = model(X_train_t)
    loss = criterion(log_ps, y_train_t.type(torch.float32))
    loss.backward()
    optimizer.step()
    train_loss = loss.item()

    # test loss
    # Turn off gradients for validation, saves memory and computations
    with torch.no_grad():
        log_ps = model(X_test_t)
        test_loss = criterion(log_ps, y_test_t.to(torch.float32))
        ps = torch.exp(log_ps)

    train_losses.append(train_loss/len(X_train_t))
    test_losses.append(test_loss/len(X_test_t))
    
    if (e % 100 == 0):
        print("Epoch: {}/{}.. ".format(e, epochs),
          "Training Loss: {:.3f}.. ".format(train_loss/len(X_train_t)),
          "Test Loss: {:.3f}.. ".format(test_loss/len(X_test_t)))

Training is not happening:

Epoch: 0/2000..  Training Loss: 0.014..  Test Loss: 0.082.. 
Epoch: 100/2000..  Training Loss: 0.014..  Test Loss: 0.082.. 
...

Solution

The source of your problem is the fact that you apply the softmax operation on the output of self.fc2. The output of self.fc2 has a size of 1 and therfore the output of the softmax will be 1 regardless of the input. Read more on the softmax activation function in the pytorch package here. I suspect that you wanted to use the Sigmoid function to transform the output of the last linear layer to to interval [0,1] and then apply a log function of some sorts.

Because the softmax results in an output of 1 regardless of the input, the model did not train well. I do not have access to your data so i can not simulate it exactly but from the information I have, replacing the softmax activation with the sigmoid should solve this.

A better and more numerically stable approach will be to use the BCEWITHLOGITSLOSS instead of the criterion in criterion = nn.BCELoss() and remove the activation function at the end, since this criterion applies the sigmoid along with the BCE loss for a more stable numerical computation.

To summarize, my advice will be to change criterion = nn.BCELoss() to criterion = nn.BCEWithLogitsLoss() and change the forawrd function as follows:

def forward(self, x):
        # make sure input tensor is flattened
        x = x.view(x.shape[0], -1)
        
        x = F.relu(self.fc1(x))
        x = self.fc2(x)

Answered By – Tomer Geva

Answer Checked By – David Marino (AngularFixing Volunteer)

Leave a Reply

Your email address will not be published.