Can neural networks handle redundant inputs?

Issue

I have a fully connected neural network with the following number of neurons in each layer [4, 20, 20, 20, ..., 1]. I am using TensorFlow and the 4 real-valued inputs correspond to a particular point in space and time, i.e. (x, y, z, t), and the 1 real-valued output corresponds to the temperature at that point. The loss function is just the mean square error between my predicted temperature and the actual temperature at that point in (x, y, z, t). I have a set of training data points with the following structure for their inputs:


(x,y,z,t):

(0.11,0.12,1.00,0.41)

(0.34,0.43,1.00,0.92)

(0.01,0.25,1.00,0.65)

(0.71,0.32,1.00,0.49)

(0.31,0.22,1.00,0.01)

(0.21,0.13,1.00,0.71)


Namely, what you will notice is that the training data all have the same redundant value in z, but x, y, and t are generally not redundant. Yet what I find is my neural network cannot train on this data due to the redundancy. In particular, every time I start training the neural network, it appears to fail and the loss function becomes nan. But, if I change the structure of the neural network such that the number of neurons in each layer is [3, 20, 20, 20, ..., 1], i.e. now data points only correspond to an input of (x, y, t), everything works perfectly and training is all right. But is there any way to overcome this problem? (Note: it occurs whether any of the variables are identical, e.g. either x, y, or t could be redundant and cause this error.) I have also attempted different activation functions (e.g. ReLU) and varying the number of layers and neurons in the network, but these changes do not resolve the problem.

My question: is there any way to still train the neural network while keeping the redundant z as an input? It just so happens the particular training data set I am considering at the moment has all z redundant, but in general, I will have data coming from different z in the future. Therefore, a way to ensure the neural network can robustly handle inputs at the present moment is sought.

A minimal working example is encoded below. When running this example, the loss output is nan, but if you simply uncomment the x_z in line 12 to ensure there is now variation in x_z, then there is no longer any problem. But this is not a solution since the goal is to use the original x_z with all constant values.

import numpy as np 
import tensorflow as tf

end_it = 10000 #number of iterations
frac_train = 1.0 #randomly sampled fraction of data to create training set
frac_sample_train = 0.1 #randomly sampled fraction of data from training set to train in batches
layers = [4, 20, 20, 20, 20, 20, 20, 20, 20, 1]
len_data = 10000
x_x = np.array([np.linspace(0.,1.,len_data)])
x_y = np.array([np.linspace(0.,1.,len_data)])
x_z = np.array([np.ones(len_data)*1.0])
#x_z = np.array([np.linspace(0.,1.,len_data)])
x_t = np.array([np.linspace(0.,1.,len_data)])
y_true = np.array([np.linspace(-1.,1.,len_data)])

N_train = int(frac_train*len_data)
idx = np.random.choice(len_data, N_train, replace=False)

x_train = x_x.T[idx,:]
y_train = x_y.T[idx,:]
z_train = x_z.T[idx,:]
t_train = x_t.T[idx,:]
v1_train = y_true.T[idx,:] 

sample_batch_size = int(frac_sample_train*N_train)

np.random.seed(1234)
tf.set_random_seed(1234)
import logging
logging.getLogger('tensorflow').setLevel(logging.ERROR)
tf.logging.set_verbosity(tf.logging.ERROR)

class NeuralNet:
    def __init__(self, x, y, z, t, v1, layers):
        X = np.concatenate([x, y, z, t], 1)  
        self.lb = X.min(0)
        self.ub = X.max(0)
        self.X = X
        self.x = X[:,0:1]
        self.y = X[:,1:2]
        self.z = X[:,2:3]
        self.t = X[:,3:4]
        self.v1 = v1 
        self.layers = layers 
        self.weights, self.biases = self.initialize_NN(layers) 
        self.sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=False,
                                                     log_device_placement=False)) 
        self.x_tf = tf.placeholder(tf.float32, shape=[None, self.x.shape[1]])
        self.y_tf = tf.placeholder(tf.float32, shape=[None, self.y.shape[1]])
        self.z_tf = tf.placeholder(tf.float32, shape=[None, self.z.shape[1]])
        self.t_tf = tf.placeholder(tf.float32, shape=[None, self.t.shape[1]])
        self.v1_tf = tf.placeholder(tf.float32, shape=[None, self.v1.shape[1]])  
        self.v1_pred = self.net(self.x_tf, self.y_tf, self.z_tf, self.t_tf) 
        self.loss = tf.reduce_mean(tf.square(self.v1_tf - self.v1_pred)) 
        self.optimizer = tf.contrib.opt.ScipyOptimizerInterface(self.loss,
                                                                method = 'L-BFGS-B',
                                                                options = {'maxiter': 50,
                                                                           'maxfun': 50000,
                                                                           'maxcor': 50,
                                                                           'maxls': 50,
                                                                           'ftol' : 1.0 * np.finfo(float).eps})
        init = tf.global_variables_initializer()  
        self.sess.run(init)
    def initialize_NN(self, layers):
        weights = []
        biases = []
        num_layers = len(layers)
        for l in range(0,num_layers-1):
            W = self.xavier_init(size=[layers[l], layers[l+1]])
            b = tf.Variable(tf.zeros([1,layers[l+1]], dtype=tf.float32), dtype=tf.float32)
            weights.append(W)
            biases.append(b) 
        return weights, biases
    def xavier_init(self, size):
        in_dim = size[0]
        out_dim = size[1]
        xavier_stddev = np.sqrt(2/(in_dim + out_dim)) 
        return tf.Variable(tf.truncated_normal([in_dim, out_dim], stddev=xavier_stddev), dtype=tf.float32)
    def neural_net(self, X, weights, biases):
        num_layers = len(weights) + 1
        H = 2.0*(X - self.lb)/(self.ub - self.lb) - 1.0
        for l in range(0,num_layers-2):
            W = weights[l]
            b = biases[l]
            H = tf.tanh(tf.add(tf.matmul(H, W), b))
        W = weights[-1]
        b = biases[-1]
        Y = tf.add(tf.matmul(H, W), b) 
        return Y
    def net(self, x, y, z, t): 
        v1_out = self.neural_net(tf.concat([x,y,z,t], 1), self.weights, self.biases)
        v1 = v1_out[:,0:1]
        return v1
    def callback(self, loss):
        global Nfeval
        print(str(Nfeval)+' - Loss in loop: %.3e' % (loss))
        Nfeval += 1
    def fetch_minibatch(self, x_in, y_in, z_in, t_in, den_in, N_train_sample):  
        idx_batch = np.random.choice(len(x_in), N_train_sample, replace=False)
        x_batch = x_in[idx_batch,:]
        y_batch = y_in[idx_batch,:]
        z_batch = z_in[idx_batch,:]
        t_batch = t_in[idx_batch,:]
        v1_batch = den_in[idx_batch,:] 
        return x_batch, y_batch, z_batch, t_batch, v1_batch
    def train(self, end_it):  
        it = 0
        while it < end_it: 
            x_res_batch, y_res_batch, z_res_batch, t_res_batch, v1_res_batch = self.fetch_minibatch(self.x, self.y, self.z, self.t, self.v1, sample_batch_size) # Fetch residual mini-batch
            tf_dict = {self.x_tf: x_res_batch, self.y_tf: y_res_batch, self.z_tf: z_res_batch, self.t_tf: t_res_batch,
                       self.v1_tf: v1_res_batch}
            self.optimizer.minimize(self.sess,
                                    feed_dict = tf_dict,
                                    fetches = [self.loss],
                                    loss_callback = self.callback) 
    def predict(self, x_star, y_star, z_star, t_star): 
        tf_dict = {self.x_tf: x_star, self.y_tf: y_star, self.z_tf: z_star, self.t_tf: t_star}
        v1_star = self.sess.run(self.v1_pred, tf_dict)  
        return v1_star

model = NeuralNet(x_train, y_train, z_train, t_train, v1_train, layers)

Nfeval = 1
model.train(end_it)

Solution

I think your problem is in this line:

H = 2.0*(X - self.lb)/(self.ub - self.lb) - 1.0

In the third column fo X, corresponding to the z variable, both self.lb and self.ub are the same value, and equal to the value in the example, in this case 1, so it is acutally computing:

2.0*(1.0 - 1.0)/(1.0 - 1.0) - 1.0 = 2.0*0.0/0.0 - 1.0

Which is nan. You can work around the issue in a few different ways, a simple option is to simply do:

# Avoids dividing by zero
X_d = tf.math.maximum(self.ub - self.lb, 1e-6)
H = 2.0*(X - self.lb)/X_d - 1.0

Answered By – jdehesa

Answer Checked By – Cary Denson (AngularFixing Admin)

Leave a Reply

Your email address will not be published.