One of the vital modules in the optical character recognition(OCR) pipeline is text detectionand segmentation which is also called text localization. In this post, we will apply variedpreprocessing techniques to the input image and find out how to localize text in theenhanced image, so that we can feed the segments to our text recognition network.
Image Preprocessing
Sometimes images can be distorted, noisy and other problems that can scale back the OCRaccuracy. To make a better OCR pipeline, we need to do some image preprocessing.
Grayscale the image: Generally you will get an image which is having 3channels(color images), we need to convert this image into a grayscale form whichcontains only one channel. We can also process images with three channels but itonly increases the complexity of the model and increases the processing time.OpenCV provides a built-in function that can do it for you.
Or you can convert the image to grayscale while reading the image.
1 2
#opencv reads image in BGR format graysacle_image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
Noise reduction: Images come with various types of noises. OpenCV provides a lot ofnoise reduction function. I am using the Non-local Means Denoising algorithm.
Contrast adjustment: Sometimes we have low contrast images. This makes it difficultto separate text from the image background. We need high contrast text images forthe localization process. We can increase image contrast using Contrast LimitedAdaptive Histogram Equalization (CLAHE) among many other contrast enhancementmethods provided by skimage.
1 2
from skimage import exposure contrast_enhanced_image = exposure.equalize_adapthist(denoised, clip_limit=0.03)
So now we are done with image preprocessing let us move on to the second part, textlocalization.
Text Localization
In this part, we will see how to detect a large number of text region candidates andprogressively removes those less likely to contain text. Using the MSER feature descriptor tofind text candidates in the image. It works well for text because the consistent color and highcontrast of text lead to stable intensity profiles.
Along with the text MSER picked up many other stable regions that are not text. Now, thegeometric properties of text can be used to filter out non-text regions using simplethresholds.
Before moving on with the filtering process, let’s write some functions to display the results ina comprehensible manner.
import numpy as np import matplotlib.pyplot as plt
#display images def pltShow(*images): #count number of images to show count = len(images) #three images per columnn Row = np.ceil(count / 3.) for i in range(count): plt.subplot(nRow, 3, i+1) if len(images[i][0], cmap=’gray’) plt.imshow(images[i][0], cmap=’gray’) else: plt.imshow(images[i][0]) #remove x-y axis from subplots plt.xticks([]) plt.yticks([]) plt.title(images[i][1]) plt.show()
#compute perimeter def getRegionPerimeter(image, region): #get top-left coordinate, width and height of the box enclosing the region x, y, w, h = cv2.boundingRect(region) return len(np.where(image[y:y+h, x:x+w] != 0)[0])) #compute aspect ratio def getAspectRatio(region): return (1.0 * max(getRegionShape(region))) / (min(getRegionShape(region)) + 1e-4)
#compute area occupied by the region area in the shape def getOccupyRate(region): return (1.0 * getRegionArea(region)) / (getRegionShape(region)[0] * \getRegionShape(region)[1] + 1.0e-10) #compute compactness of the regio ndef getCompactness(region): return (1.0 * getRegionArea(region)) / (1.0 * getRegionPerimeter(region) ** 2)
Now apply these methods to filter out text regions as follows:
#total number of MSER regions n1 = len(regions) bboxes=[] for i, region in enumerate(regions): self.colorRegion(res, region) if self.getRegionArea(region) > self.grayImg.shape[0] * self.grayImg.shape[1] * AREA_LIM: #number of regions meeting the area criteria n2 += 1 self.colorRegion(res2, region)
if self.getRegionPerimeter(region) > 2 * (self.grayImg.shape[0] + \ self.grayImg.shape[1]) * PERIMETER_LIM: #number of regions meeting the perimeter criteria n3 += 1 self.colorRegion(res3, region) if self.getAspectRatio(region) < ASPECT_RATIO_LIM: #number of regions meeting the aspect ratio criteria n4 += 1 self.colorRegion(res4, region)
if (self.getOccupyRate(region) > OCCUPATION_LIM[0]) and \ (self.getOccupyRate(region) < OCCUPATION_LIM[1]): n5 += 1 self.colorRegion(res5, region)
if (self.getCompactness(region) > \COMPACTNESS_LIM[0]) and \(self.getCompactness(region) < \COMPACTNESS_LIM[1]): #final number of regions left n6 += 1 self.colorRegion(res6, region) bboxes.append(mser_bboxes[i])
After eliminating non-text regions, I draw bounding boxes on the remaining regions andvoila, we have successfully detected and segmented the characters on the number plate. Note: Apply NMS to remove overlapping bounding boxes.
1 2
for bbox in bboxes: cv2.rectangle(img,(bbox[0]-1,bbox[1]-1),(bbox[0]+bbox[2]+1,box[1]+bbox[3]+1),(255,0,0), 1)