tagging words with different lengths in order

Issue

Hi i am trying to tag the words in a sentence in order.
For example, (my initial method)

Sentence: Work across a wide range of related areas
Label:    Tag    O    O O    O     O  Tag     Tag

But now i need it to be like this where it can tag 2 words as a keyword aand label it together:

Sentence: Work across a wide range of related areas
Label:    Tag    O    O O    O     O  Tag     

I have a list of keyword of varying length and their tags. How can i tag the way i need it to be in the sentence order?

Solution

Looks like what you are looking for is the BIO-tagging system (If I understood you correctly, and you are looking for a solution in manually tagged corpora).

BIO denotes the following: B – beginning of a chunk, I – the inside of the chunk, O – is a token outside of a chunk.

Step 1

Sentence: Work across a wide range of related areas
Tag:       B     O    O   O    O    O   B        I
Label:  Label_1  O    O   O    O    O   Label_2  Label_2 

Step 2

Sentence: Work across a wide range of related areas
Label:  B-Label_1  O    O   O    O    O   B-Label_2  I-Label_2 

Once you have tagged your corpus, you will align the lists of Sentences (list #1) and Tag + Label combos (list #2):
the BIO tags will be prefixed to your labels, e.g., […related, areas] + [… B-Label_2, I-Label_2].
That way you can combine [B-Label_2, I-Label_2] into one Label_2 since you have a pattern of BI together. You will just have to strip the prefixes at the very end and do a lot of other intermediate steps and post-processing.

Answered By – Anscandance

Answer Checked By – Marilyn (AngularFixing Volunteer)

Leave a Reply

Your email address will not be published.