how to label training data for Tesseract

Issue

I want to train my own model to detect and recognize ID card with Tesseract. I want to extract the key information like name, id from it. The data looks like: [sample of data]
enter image description here

The introduction of training can only input text with single line.I’m confused how to train the detection model in Tesseract and should I label single character or label the whole text line in each box. (https://github.com/tesseract-ocr/tesstrain)
enter image description here

Solution

Oh, I think I get it. Tesseract doesn’t need a detection model to get the position of the text line, it recognize each blob(letter) and uses the position of each letter to locate the text line.

Tesseract data process

Answered By – zzzz

Answer Checked By – Timothy Miller (AngularFixing Admin)

Leave a Reply

Your email address will not be published.