How to find and crop words into individual images with Python OpenCV?


enter image description here

I have a binary image of words as shown, and I want crop the image with each character in different image. Output should have different images of k,7,2,f,5 & m. I tried using OpenCV in python, but due to some reason I’m not able to extract it. If I can plot a box over each text then also, it’ll be good enough.


Here’s a simple approach:

  • Convert to grayscale
  • Otsu’s threshold
  • Find contours, sort contours from left-to-right, and filter using contour area
  • Extract ROI

After Otsu’s thresholding to obtain a binary image, we sort contours from left-to-right using imutils.contours.sort_contours(). This ensures that when we iterate through each contour, we have each character in the correct order. In addition, we filter using a minimum threshold area to remove small noise. Here’s the detected characters

enter image description here

We can extract each character using Numpy slicing. Here’s each saved character ROI

enter image description here

If you want the other way, simply invert it

ROI = 255 - image[y:y+h, x:x+w]

enter image description here

import cv2
from imutils import contours

image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray,0,255,cv2.THRESH_OTSU + cv2.THRESH_BINARY)[1]

cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cnts, _ = contours.sort_contours(cnts, method="left-to-right")

ROI_number = 0
for c in cnts:
    area = cv2.contourArea(c)
    if area > 10:
        x,y,w,h = cv2.boundingRect(c)
        ROI = 255 - image[y:y+h, x:x+w]
        cv2.imwrite('ROI_{}.png'.format(ROI_number), ROI)
        cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 1)
        ROI_number += 1
cv2.imshow('thresh', thresh)
cv2.imshow('image', image)

Answered By – nathancy

Answer Checked By – Marie Seifert (AngularFixing Admin)

Leave a Reply

Your email address will not be published.