Issue I am following this example to determine feature importance using Random Forests. When using numerous features and only a subset of these features, these are the results I observe, respectively: Is there a particular reason the error bars increase
Continue readingTag: scikit-learn
Error bars in feature selection increase when using more features?
Issue I am following this example to determine feature importance using Random Forests. When using numerous features and only a subset of these features, these are the results I observe, respectively: Is there a particular reason the error bars increase
Continue readingHow can I get the train and test scores for each iteration of a MLPRegressor?
Issue This answer seems exactly what I need BUT for a regressor instead of a classifier. https://stackoverflow.com/a/46913459/9726897 I made very minor modifications to modified the code provided by sascha from link as shown below. I thought it would be fairly
Continue readingWhy Regularization strength negative value is not a right approach?
Issue I have a general question regarding training your model when adding the Regularization strength λ parameter as it puts penalty on your score to prevent over-fitting (as far as I know from class and Tootone answer linked below) So
Continue readingGive scikit-learn classifier custom training data?
Issue I have been working on this all day (struggled rather). Having read through the documentation, many other tutorials and due to my inexperience, I can’t figure out how to use my own data with a MultinomialNB classifier? Here is
Continue readingHow to only use all features for training, but only 2 features for testing with SciKit learn?
Issue I am building a machine learning model for predicting Premier League (football/soccer) results using this dataset, which has features such as Home Goals, Away Goals, Shots on target etc. This is my code currently after I have loaded the
Continue readingRun trained Machine Learning model on a different dataset
Issue I am new to Machine Learning and am in the process of trying to run a simple classification model that I trained and saved using pickle, on another dataset of the same format. I have the following python code.
Continue readingPython: How to fit a model with user defined functions
Issue I’m working on the isolation forest. I implemented this code in order to buid isolation forest that contain iTrees. import pandas as pd import numpy as np import random from sklearn.model_selection import train_test_split class ExNode: def __init__(self,size): self.size=size class
Continue readingPreparing training data sets
Issue When preparing a training data set, do I need to remove the target variable data from the training data set or is it fine to leave it in? So, should X = df[:,:] in the code below exclude the
Continue readingreshape machine learning input data for different algorithms
Issue I am experimenting in sklearn learn classification with some NLTK type tutorials. Can someone help me understand why the sklearn MLP neural network can handle different input shapes but the other classifiers cannot? My input training data is a
Continue readingadding more data to Support Vector Classifier training
Issue I am using the LinearSVC() available on scikit learn to classify texts into a max of 7 seven labels. So, it is a multilabel classification problem. I am training on a small amount of data and testing it. Now,
Continue readingHow to override Sklearn module function
Issue I’m using sklearn.metrics.cohen_kappa_score to evaluate my module. The function weights can be None , ‘linear’ or ‘quadratic’ I would like to override the function in order to be able to send custom weights matrix. how can it be done?
Continue readingpassing an extra argument to GenericUnivariateSelect without scope tricks
Issue EDIT: here is the complete traceback if I apply the make_scorer workaround suggested in the answers… `File “________python/anaconda-2.7.11-64/lib/python2.7/site-packages/spyder/utils/site/sitecustomize.py”, line 880, in runfile execfile(filename, namespace) File “”________python/anaconda-2.7.11-64/lib/python2.7/site-packages/spyder/utils/site/sitecustomize.py”, line 94, in execfile builtins.execfile(filename, *where) File “”________/main_”________.py”, line 43, in <module> “_________index.fit(X,Y
Continue readingPython SKLearn training test data
Issue This is my first time working on machine learning. I have an assignment to run Logistic and Bayesian Regression from Sklearn on apple stock returns and compare that with linear regression + tensor flow. I am not sure if
Continue readingFitting Training Labels on a 2D List in Scikit-learn
Issue I am trying to map rows in a 2d to list to elements in a list of labels with Scikit-learn. For example: from sklearn import tree clf = DecisionTreeClassifier() #2D list of training data: training_data = [[1, 2, 3],
Continue readingHow to understand a function which splits the data
Issue Can someone help me understanding what this function does? I understand up to the line print but after that I’m a bit lost. Starting from train_data. def stratifiedShuffleSplit_data(X, y): sss = StratifiedShuffleSplit(n_splits=5, test_size=0.5, random_state=0) for train_index, test_index in sss.split(X,
Continue readingMachine Learning Training & Test data split method
Issue I was running a random forest classification model and initially divided the data into train (80%) and test (20%). However, the prediction had too many False Positive which I think was because there was too much noise in training
Continue readingValueError: multiclass format is not supported
Issue While I am trying to use metrics.roc_auc_score, I am getting ValueError: multiclass format is not supported. import lightgbm as lgb from sklearn import metrics def train_model(train, valid): dtrain = lgb.Dataset(train, label=y_train) dvalid = lgb.Dataset(valid, label=y_valid) param = {‘num_leaves’: 64,
Continue readingSelect a random subset of data
Issue I have a dateset given to me that was previously split in training and validation (test) data. I need to further split the training data into a separate training data and calibration set, I don’t want to touch my
Continue readingWhy does `partial_fit` in `SGDClassifier` suffer from gradual reduction in model accuracy
Issue I am training an online-leaning SVM Classifier using SGDClassifier in sklearn. I learnt that it is possible using partial_fit. My model definition is : model = SGDClassifier(loss="hinge", penalty="l2", alpha=0.0001, max_iter=3000, tol=1e-3, shuffle=True, verbose=0, learning_rate=’invscaling’, eta0=0.01, early_stopping=False) and it is
Continue readingHow can I specify a training set and test set from separate dataframes?
Issue I have a dataframe with a mixture of news articles and Facebook posts (full texts) with a corresponding label (a single set of labels for all the texts – both the articles and the posts). However, I want to
Continue readingNeed help understanding the MLPClassifier
Issue I’ve been working with the MLPClassifier for a while and I think I had a wrong interpretation of what the function is doing for the whole time and I think I got it right now, but I am not
Continue readingRe-fitting a saved scikit-learn model without some features not used – "ValueError: A given column is not a column of the dataframe"
Issue I’d need to re-fit a scikit-learn pipeline using a smaller dataset, without some features that are actually not used by the model. (The actual situation is that I’m saving it through joblib and loading it in another file where
Continue readingHow to measure xgboost regressor accuracy using accuracy_score (or other suggested function)
Issue I’m making a code to solve a simple problem of predict the probability of an item missing from an inventory. I’m using the XGBoost prediction model to do this. I have the data split in two .csv files, one
Continue readingForecast next day without train and test split
Issue Typically when we have a data frame we split it into train and test. For example, imagine my data frame is something like this: > df.head() Date y wind temperature 1 2019-10-03 00:00:00 33 12 15 2 2019-10-03 01:00:00
Continue readingWhat happens when we apply .fit() method to a kNN model in Scikit-learn if kNN has no training phase?
Issue Since kNN handles both training and prediction at the RAM level and requires no explicit training process, what exactly happens when a knn model is being fitted? I thought this step was related to training the model. Thank you.
Continue readingusing sklearn.train_test_split for Imbalanced data
Issue I have a very imbalanced dataset. I used sklearn.train_test_split function to extract the train dataset. Now I want to oversample the train dataset, so I used to count number of type1(my data set has 2 categories and types(type1 and
Continue readingplotting Iris Classification
Issue The code below classifies three groups of Iris through the Decision Tree classifier. import pandas as pd from sklearn import datasets from sklearn.model_selection import train_test_split, cross_val_score, KFold from sklearn.tree import DecisionTreeClassifier iris = datasets.load_iris() dataset = pd.DataFrame(iris[‘data’], columns=iris[‘feature_names’]) dataset[‘target’]
Continue readingUsing Column Transformer in Scikit to preprocess train and test data with target variable
Issue I have problems to preprocess the dataset as a whole with columntransformer – maybe you can help: First I read in my dataset: X_train, X_test, y_train, y_test = train_test_split(df, target, test_size=0.2, random_state=seed) Then I do my preprocessing: preprocessor =
Continue readingTrain Model fails because 'list' object has no attribute 'lower'
Issue I am training a classifier over tweets for sentiment analysis purposes. The code is the following: df = pd.read_csv(‘Trainded Dataset Sentiment.csv’, error_bad_lines=False) df.head(5) #TWEET X = df[[‘SentimentText’]].loc[2:50000] #SENTIMENT LABEL y = df[[‘Sentiment’]].loc[2:50000] #Apply Normalizer function over the tweets X[‘Normalized
Continue readingApplying undersampling techniques to train and test data
Issue I know if you perform some sort of transformation and you use fit() then you have to transform() both the training set and the test set. Suppose you apply a targeted undersampling technique such as TomekLinks to your training
Continue readingScikit learn train_test_split into Pytorch Dataloader
Issue I have a dataset for binary classification with PNGs titled as in the attachment below, where the first 0 or 1 in the title determines its class. They’re in a folder called "annotation_class", and I have a small script
Continue readingHow to pass different set of data to train and test without splitting a dataframe. (python)?
Issue I have gone through multiple questions that help divide your dataframe into train and test, with scikit, without etc. But my question is I have 2 different csvs ( 2 different dataframes from different years). I want to use
Continue readingIs it possible to get the number of rows of the training set from a LGBMClassifier?
Issue I have trained a model using lightgbm.sklearn.LGBMClassifier from the lightgbmpackage. I can find out the number of columns and column names of the training data from the model but I have not found a way to find the row
Continue readingParameter "stratify" from method "train_test_split" (scikit Learn)
Issue I am trying to use train_test_split from package scikit Learn, but I am having trouble with parameter stratify. Hereafter is the code: from sklearn import cross_validation, datasets X = iris.data[:,:2] y = iris.target cross_validation.train_test_split(X,y,stratify=y) However, I keep getting the
Continue readingIs this a valid approach to scale your target in machine learning without leaking information?
Issue Consider a housing price dataset, where the goal is to predict the sale price. I would like to do this by predicting the "Sale price per Squaremeter" instead, since it yields better results. The question is if I implement
Continue readingHow to use sklearn's standard scaler with make_pipeline?
Issue I am used to running sklearn’s standard scaler the following way: from sklearn.preprocessing import StandardScaler scaler = StandardScaler().fit(X_train) scaled_X_train = scaler.transform(X_train) Where X_train is an array containing the features in my training dataset. I may then use the same
Continue reading