I have trained a model using
lightgbm.sklearn.LGBMClassifier from the
lightgbmpackage. I can find out the number of columns and column names of the training data from the model but I have not found a way to find the row number of the training data. Is it possible to do so? The best solution would be to obtain the training data from the model but I have not come across anything like that.
# This gives the number of the columns the model is trained with lgbm_model.n_features_ # Any way to find out the row number of the training data as well? lgbm_model.n_instances_ # does not exist!
The tree structure of a LightGBM model includes information about how many records from the training data would fall into each node in the tree if that node were a leaf node. In LightGBM’s code, this value is called
Since all data matches the root node of each tree, in most situations you can use that information to figure out, given a LightGBM model, how many instances were in the training data.
Consider the following example, using
lightgbm==3.3.2 and Python 3.8.8.
import lightgbm as lgb from sklearn.datasets import make_blobs X, y = make_blobs(n_samples=1234, centers=[[-4, -4], [-4, 4]]) clf = lgb.LGBMClassifier(num_iterations=10, subsample=0.5) clf.fit(X, y) num_data = clf.booster_.dump_model()["tree_info"]["tree_structure"]["internal_count"] print(num_data) # 1234
This will work in most cases. There are two special circumstances where this number could be misleading as an answer to the question "how much data was used to train this model":
- if you set
bagging_fraction<1.0, then at each iteration LightGBM will only use a fraction of the training data to evaluate splits (see the LightGBM docs for details on
- if you use "training continuation", where you take an existing model and perform additional boosting rounds, and you use a different training set for those additional boosting rounds, then "how much data was used to train this model" will have a complicated answer that depends on which range of boosting rounds you’re referring to by "this model"
Answered By – James Lamb
Answer Checked By – Candace Johnson (AngularFixing Volunteer)