## Issue

I have a bespoke NN model which works and wanted to move it to the PyTorch framework. However, the network is not training likely due to some misconfiguration. Please advise if you see something that is odd/wrong or could be a contributing reason.

```
import torch
from torch import nn, optim
import torch.nn.functional as F
X_train_t = torch.tensor(X_train).float()
X_test_t = torch.tensor(X_test).float()
y_train_t = torch.tensor(y_train).long().reshape(y_train_t.shape[0], 1)
y_test_t = torch.tensor(y_test).long().reshape(y_test_t.shape[0], 1)
class Classifier(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(22, 10)
self.fc2 = nn.Linear(10, 1)
def forward(self, x):
# make sure input tensor is flattened
x = x.view(x.shape[0], -1)
x = F.relu(self.fc1(x))
x = F.log_softmax(self.fc2(x), dim=1)
return x
model = Classifier()
criterion = nn.BCELoss()
optimizer = optim.SGD(model.parameters(), lr=0.003)
epochs = 2000
steps = 0
train_losses, test_losses = [], []
for e in range(epochs):
# training loss
optimizer.zero_grad()
log_ps = model(X_train_t)
loss = criterion(log_ps, y_train_t.type(torch.float32))
loss.backward()
optimizer.step()
train_loss = loss.item()
# test loss
# Turn off gradients for validation, saves memory and computations
with torch.no_grad():
log_ps = model(X_test_t)
test_loss = criterion(log_ps, y_test_t.to(torch.float32))
ps = torch.exp(log_ps)
train_losses.append(train_loss/len(X_train_t))
test_losses.append(test_loss/len(X_test_t))
if (e % 100 == 0):
print("Epoch: {}/{}.. ".format(e, epochs),
"Training Loss: {:.3f}.. ".format(train_loss/len(X_train_t)),
"Test Loss: {:.3f}.. ".format(test_loss/len(X_test_t)))
```

Training is not happening:

```
Epoch: 0/2000.. Training Loss: 0.014.. Test Loss: 0.082..
Epoch: 100/2000.. Training Loss: 0.014.. Test Loss: 0.082..
...
```

## Solution

The source of your problem is the fact that you apply the *softmax* operation on the output of `self.fc2`

. The output of `self.fc2`

has a size of 1 and therfore the output of the *softmax* will be 1 regardless of the input. Read more on the softmax activation function in the pytorch package here. I suspect that you wanted to use the Sigmoid function to transform the output of the last linear layer to to interval [0,1] and then apply a log function of some sorts.

Because the *softmax* results in an output of 1 regardless of the input, the model did not train well. I do not have access to your data so i can not simulate it exactly but from the information I have, replacing the *softmax* activation with the *sigmoid* should solve this.

A better and more numerically stable approach will be to use the BCEWITHLOGITSLOSS instead of the criterion in `criterion = nn.BCELoss()`

and remove the activation function at the end, since this criterion applies the sigmoid along with the BCE loss for a more stable numerical computation.

To summarize, my advice will be to change `criterion = nn.BCELoss()`

to `criterion = nn.BCEWithLogitsLoss()`

and change the forawrd function as follows:

```
def forward(self, x):
# make sure input tensor is flattened
x = x.view(x.shape[0], -1)
x = F.relu(self.fc1(x))
x = self.fc2(x)
```

Answered By – Tomer Geva

Answer Checked By – David Marino (AngularFixing Volunteer)