Difference between Keras model.fit using only batch_size and using only steps_per_epoch

Issue

When I run model.fit using both batch_size and steps_per_epoch parameters I receive the following error:

ValueError: If steps_per_epoch is set, the `batch_size` must be None.

So, from this error and from the following piece of documentation from keras Model(functional API)

batch_size: Integer or None. Number of samples per gradient update. If
unspecified, batch_size will default to 32.

steps_per_epoch: Integer or None. Total number of steps (batches of samples)
before declaring one epoch finished and starting the next epoch. When training
with input tensors such as TensorFlow data tensors, the default None is equal to the number of samples in your dataset divided by the batch size, or 1 if that cannot be determined.

I understand that both parameters are somehow equivalent. But, on my laptop, (with a GeForce 940M graphics card with 2GB of VRAM and training the cifar10 dataset) when I run model.fit with epochs parameter set to 256 the script runs fine and the feedback gave from keras is like this:

4608/50000 [=>............................] - ETA: 1:59 - loss: 0.8167 - acc: 0.7398

updating the first number always adding 256 units. However, when passing steps_per_epoch as the number_train//batch_size I run out of memory and cannot run my script unless I pass batch_size as 1.

So, how model.fit work with those parameters? What is the difference when I use just one of those instead of the other one?

Solution

That’s a good question. What I observe from the source code ([1] and [2]) is that:

  • When you set batch_size, the training data is sliced into batches of this size (see L184).
  • When you set steps_per_epoch, if the training inputs are not framework-native tensors (this is the most common case), the whole training set is being fed into the network in a single batch (see L152), and that’s why you get the memory error.

Therefore, based on the implementation, I would advise to use the argument steps_per_epoch only when feeding through framework-native tensors (i.e. TensorFlow tensors with the first dimension being the batch size), and that is indeed a requirement. In order to do this, the arguments x and y from model.fit need to be set to None.

Answered By – rvinas

Answer Checked By – Cary Denson (AngularFixing Admin)

Leave a Reply

Your email address will not be published.