Dropped event message OpenNLP. Training data is dropped in OpenNLP

Issue

I have labeled data (label and text), like this:

category1, "train message 1"
category1, "train message 2"
category1, "train message 3"
category2, "train message 4"
category2, "train messsage 5"

I try to train my categorize model with Java OpenNLP library.

DoccatModel model = DocumentCategorizerME.train("pt", sampleStream, params, customFactory);

When i training model, i get strange messages:

**Indexing events using cutoff of 5**
**Computing event counts...  done. 5441 events**
Dropped event animals*:[bow=live, bow=animals, ng=:live:animals]
Dropped event animals*:[bow=aquariums]
Dropped event animals*:[bow=aquatic, bow=plant, bow=fertilizers, ng=:aquatic:plant,ng=:aquatic:plant:fertilizers, ng=:plant:fertilizers]
Dropped event apparel*:[bow=activewear]
Dropped event apparel*:[bow=one, bow=pieces, ng=:one:pieces]

Why does it mean Dropped event "category": [….]?**

Solution

I added custom factory, it work

int minNgramSize = 2;
int maxNgramSize = 3;
DoccatFactory customFactory = new DoccatFactory(new FeatureGenerator[]{
            new BagOfWordsFeatureGenerator(),
            new NGramFeatureGenerator(minNgramSize, maxNgramSize)
            });
DoccatModel model = DocumentCategorizerME.train("pt", sampleStream, params, customFactory);

Answered By – Alex Titov

Answer Checked By – Marilyn (AngularFixing Volunteer)

Leave a Reply

Your email address will not be published.