Split dataset into training and test by month

Issue

I was not able to find the answer to this anywhere. I have data for three months, where I would like to split it into the first two months(‘Jan-19’, ‘Feb-19’) as training set and the last month as the test (‘Mar-19’).

Previously I have done random sampling with simple code like this:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30,random_state=109)

and before that assigned y as the label and x as the columns to use to predict. I’m not sure how to assign the test and training to the months I want.

Thank you

Solution

If your data is in a pandas dataframe, you can use subsetting like this:

X_train = X[X['month'] != 'Mar-19']
y_train = y[X['month'] != 'Mar-19']

X_test = X[X['month'] == 'Mar-19']
y_test = y[X['month'] == 'Mar-19']

Answered By – josephjscheidt

Answer Checked By – Willingham (AngularFixing Volunteer)

Leave a Reply

Your email address will not be published.