Python text recognition from screen

View Discussion

Improve Article

Save Article

Read

Discuss

View Discussion

Improve Article

Save Article

ImageGrab and PyTesseract

ImageGrab is a Python module that helps to capture the contents of the screen. PyTesseract is an Optical Character Recognition[OCR] tool for Python. Together they can be used to read the contents of a section of the screen.

Installation –

Pillow [a newer version of PIL]
pip install Pillow
PyTesseract
pip install pytesseract
Apart from this, a tesseract executable needs to be installed.

Implementation of code

The following functions were primarily used in the code –

pytesseract.image_to_string[image, lang=**language**] – Takes the image and searches for words of the language in their text.
cv2.cvtColor[image, **colour conversion**] – Used to make the image monochrome[using cv2.COLOR_BGR2GRAY].
ImageGrab.grab[bbox=**Coordinates of the area of the screen to be captured**] – Used to repeatedly[using a loop] capture a specific part of the screen.

The objectives of the code are:

To use a loop to repeatedly capture a part of the screen.
To convert the captured image into grayscale.
Use PyTesseract to read the text in it.

Code : Python code to use ImageGrab and PyTesseract

import numpy as nm

import pytesseract

import cv2

from PIL import ImageGrab

def imToString[]:

pytesseract.pytesseract.tesseract_cmd ='**Path to tesseract executable**'

while[True]:

cap = ImageGrab.grab[bbox =[700, 300, 1400, 900]]

tesstr = pytesseract.image_to_string[

cv2.cvtColor[nm.array[cap], cv2.COLOR_BGR2GRAY],

lang ='eng']

print[tesstr]

imToString[]

Output

The above code can be used to capture a certain section of the screen and read the text contents of it.

Read about other libraries used in the code

Numpy
OpenCV[cv2]

In this blog post, we will try to explain the technology behind the most used Tesseract Engine, which was upgraded with the latest knowledge researched in optical character recognition.

This article will also serve as a how-to guide/ tutorial on how to implement OCR in python using the Tesseract engine. We will be walking through the following modules:

Tesseract OCR Features
Preprocessing for OCR using OpenCV
Running Tesseract with CLI and Python
Limitations of Tesseract engine

Introduction
Open Source OCR Tools
Tesseract OCR
OCR with Pytesseract and OpenCV
Training Tesseract on custom data
Limitations of Tesseract
OCR with Nanonets
- Nanonets and Humans in the Loop
Conclusion
- Further Reading

Introduction

OCR = Optical Character Recognition. In other words, OCR systems transform a two-dimensional image of text, that could contain machine printed or handwritten text from its image representation into machine-readable text. OCR as a process generally consists of several sub-processes to perform as accurately as possible. The subprocesses are:

Preprocessing of the Image
Text Localization
Character Segmentation
Character Recognition
Post Processing

The sub-processes in the list above of course can differ, but these are roughly steps needed to approach automatic character recognition. In OCR software, it’s main aim to identify and capture all the unique words using different languages from written text characters.

For almost two decades, optical character recognition systems have been widely used to provide automated text entry into computerized systems. Yet in all this time, conventional online OCR systems [like zonal OCR] have never overcome their inability to read more than a handful of type fonts and page formats. Proportionally spaced type [which includes virtually all typeset copy], laser printer fonts, and even many non-proportional typewriter fonts, have remained beyond the reach of these systems. And as a result, conventional OCR has never achieved more than a marginal impact on the total number of documents needing conversion into digital form.

Optical Character Recognition process [Courtesy]

Next-generation OCR engines deal with these problems mentioned above really good by utilizing the latest research in the area of deep learning. By leveraging the combination of deep models and huge datasets publicly available, models achieve state-of-the-art accuracies on given tasks. Nowadays it is also possible to generate synthetic data with different fonts using generative adversarial networks and few other generative approaches.

Optical Character Recognition remains a challenging problem when text occurs in unconstrained environments, like natural scenes, due to geometrical distortions, complex backgrounds, and diverse fonts. The technology still holds an immense potential due to the various use-cases of deep learning based OCR like

building license plate readers
digitizing invoices
digitizing menus
digitizing ID cards

Have an OCR problem in mind? Want to reduce your organization's data entry costs? Head over to Nanonets and build OCR models to extract text from images or extract data from PDFs with AI based PDF OCR!

There are a lot of optical character recognition software available. I did not find any quality comparison between them, but I will write about some of them that seem to be the most developer-friendly.

Tesseract - an open-source OCR engine that has gained popularity among OCR developers. Even though it can be painful to implement and modify sometimes, there weren’t too many free and powerful OCR alternatives on the market for the longest time. Tesseract began as a Ph.D. research project in HP Labs, Bristol. It gained popularity and was developed by HP between 1984 and 1994. In 2005 HP released Tesseract as an open-source software. Since 2006 it is developed by Google.

google trends comparison for different open source OCR tools

OCRopus - OCRopus is an open-source OCR system allowing easy evaluation and reuse of the OCR components by both researchers and companies. A collection of document analysis programs, not a turn-key OCR system. To apply it to your documents, you may need to do some image preprocessing, and possibly also train new models. In addition to the recognition scripts themselves, there are several scripts for ground truth editing and correction, measuring error rates, determining confusion matrices that are easy to use and edit.

Ocular - Ocular works best on documents printed using a hand press, including those written in multiple languages. It operates using the command line. It is a state-of-the-art historical OCR system. Its primary features are:

Unsupervised learning of unknown fonts: requires only document images and a corpus of text.
Ability to handle noisy documents: inconsistent inking, spacing, vertical alignment
Support for multilingual documents, including those that have considerable word-level code-switching.
Unsupervised learning of orthographic variation patterns including archaic spellings and printer shorthand.
Simultaneous, joint transcription into both diplomatic [literal] and normalized forms.

SwiftOCR - I will also mention the OCR engine written in Swift since there is huge development being made into advancing the use of the Swift as the development programming language used for deep learning. Check out blog to find out more why. SwiftOCR is a fast and simple OCR library that uses neural networks for image recognition. SwiftOCR claims that their engine outperforms well known Tessaract library.

In this blog post, we will put focus on Tesseract OCR and find out more about how it works and how it is used.

Tesseract OCR

Tesseract is an open source text recognition [OCR] Engine, available under the Apache 2.0 license. It can be used directly, or [for programmers] using an API to extract printed text from images. It supports a wide variety of languages. Tesseract doesn't have a built-in GUI, but there are several available from the 3rdParty page. Tesseract is compatible with many programming languages and frameworks through wrappers that can be found here. It can be used with the existing layout analysis to recognize text within a large document, or it can be used in conjunction with an external text detector to recognize text from an image of a single text line.

OCR Process Flow to build API with Tesseract from a blog post

Tesseract 4.00 includes a new neural network subsystem configured as a text line recognizer. It has its origins in OCRopus' Python-based LSTM implementation but has been redesigned for Tesseract in C++. The neural network system in Tesseract pre-dates TensorFlow but is compatible with it, as there is a network description language called Variable Graph Specification Language [VGSL], that is also available for TensorFlow.

To recognize an image containing a single character, we typically use a Convolutional Neural Network [CNN]. Text of arbitrary length is a sequence of characters, and such problems are solved using RNNs and LSTM is a popular form of RNN. Read this post to learn more about LSTM.

Technology - How it works

LSTMs are great at learning sequences but slow down a lot when the number of states is too large. There are empirical results that suggest it is better to ask an LSTM to learn a long sequence than a short sequence of many classes. Tesseract developed from OCRopus model in Python which was a fork of a LSMT in C++, called CLSTM. CLSTM is an implementation of the LSTM recurrent neural network model in C++, using the Eigen library for numerical computations.

Tesseract 3 OCR process from paper

Legacy Tesseract 3.x was dependant on the multi-stage process where we can differentiate steps:

Word finding
Line finding
Character classification

Word finding was done by organizing text lines into blobs, and the lines and regions are analyzed for fixed pitch or proportional text. Text lines are broken into words differently according to the kind of character spacing. Recognition then proceeds as a two-pass process. In the first pass, an attempt is made to recognize each word in turn. Each word that is satisfactory is passed to an adaptive classifier as training data. The adaptive classifier then gets a chance to more accurately recognize text lower down the page.

Modernization of the Tesseract tool was an effort on code cleaning and adding a new LSTM model. The input image is processed in boxes [rectangle] line by line feeding into the LSTM model and giving output. In the image below we can visualize how it works.

How Tesseract uses LSTM model presentation

After adding a new training tool and training the model with a lot of data and fonts, Tesseract achieves better performance. Still, not good enough to work on handwritten text and weird fonts. It is possible to fine-tune or retrain top layers for experimentation.

Installing Tesseract

Installing tesseract on Windows is easy with the precompiled binaries found here. Do not forget to edit “path” environment variable and add tesseract path. For Linux or Mac installation it is installed with few commands.

After the installation verify that everything is working by typing command in the terminal or cmd:

$ tesseract --version

And you will see the output similar to:

tesseract 4.0.0
leptonica-1.76.0
libjpeg 9c : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.8
Found AVX2
Found AVX
Found SSE

You can install the python wrapper for tesseract after this using pip.
$ pip install pytesseract

Tesseract library is shipped with a handy command-line tool called tesseract. We can use this tool to perform OCR on images and the output is stored in a text file. If we want to integrate Tesseract in our C++ or Python code, we will use Tesseract’s API.

Running Tesseract with CLI

Call the Tesseract engine on the image with image_path and convert image to text, written line by line in the command prompt by typing the following:

$ tesseract image_path stdout

To write the output text in a file:

$ tesseract image_path text_result.txt

To specify the language model name, write language shortcut after -l flag, by default it takes English language:

$ tesseract image_path text_result.txt -l eng

By default, Tesseract expects a page of text when it segments an image. If you're just seeking to OCR a small region, try a different segmentation mode, using the --psm argument. There are 14 modes available which can be found here. By default, Tesseract fully automates the page segmentation but does not perform orientation and script detection. To specify the parameter, type the following:

$ tesseract image_path text_result.txt -l eng --psm 6

There is also one more important argument, OCR engine mode [oem]. Tesseract 4 has two OCR engines — Legacy Tesseract engine and LSTM engine. There are four modes of operation chosen using the --oem option.
0    Legacy engine only.
1    Neural nets LSTM engine only.
2    Legacy + LSTM engines.
3    Default, based on what is available.

Result of the Tesseract OCR engine

OCR with Pytesseract and OpenCV

Pytesseract or Python-tesseract is an OCR tool for python that also serves as a wrapper for the Tesseract-OCR Engine. It can read and recognize text in images and is commonly used in python ocr image to text use cases.

It is also useful as a stand-alone invocation script to tesseract, as it can read all image types supported by the Pillow and Leptonica imaging libraries, including jpeg, png, gif, bmp, tiff, and others.

More info about Python approach read here. The code for this tutorial can be found in this repository.

import cv2 
import pytesseract

img = cv2.imread['image.jpg']

# Adding custom options
custom_config = r'--oem 3 --psm 6'
pytesseract.image_to_string[img, config=custom_config]

Preprocessing for Tesseract

To avoid all the ways your tesseract output accuracy can drop, you need to make sure the image is appropriately pre-processed.

This includes rescaling, binarization, noise removal, deskewing, etc.

To preprocess image for OCR, use any of the following python functions or follow the OpenCV documentation.

import cv2
import numpy as np

img = cv2.imread['image.jpg']

# get grayscale image
def get_grayscale[image]:
    return cv2.cvtColor[image, cv2.COLOR_BGR2GRAY]

# noise removal
def remove_noise[image]:
    return cv2.medianBlur[image,5]
 
#thresholding
def thresholding[image]:
    return cv2.threshold[image, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU][1]

#dilation
def dilate[image]:
    kernel = np.ones[[5,5],np.uint8]
    return cv2.dilate[image, kernel, iterations = 1]
    
#erosion
def erode[image]:
    kernel = np.ones[[5,5],np.uint8]
    return cv2.erode[image, kernel, iterations = 1]

#opening - erosion followed by dilation
def opening[image]:
    kernel = np.ones[[5,5],np.uint8]
    return cv2.morphologyEx[image, cv2.MORPH_OPEN, kernel]

#canny edge detection
def canny[image]:
    return cv2.Canny[image, 100, 200]

#skew correction
def deskew[image]:
    coords = np.column_stack[np.where[image > 0]]
    angle = cv2.minAreaRect[coords][-1]
     if angle < -45:
        angle = -[90 + angle]
    else:
        angle = -angle
    [h, w] = image.shape[:2]
    center = [w // 2, h // 2]
    M = cv2.getRotationMatrix2D[center, angle, 1.0]
    rotated = cv2.warpAffine[image, M, [w, h], flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE]
    return rotated

#template matching
def match_template[image, template]:
    return cv2.matchTemplate[image, template, cv2.TM_CCOEFF_NORMED]

Let's work with an example to see things better. This is what our original image looks like -

The Aurebesh writing system

After preprocessing with the following code

image = cv2.imread['aurebesh.jpg']

gray = get_grayscale[image]
thresh = thresholding[gray]
opening = opening[gray]
canny = canny[gray]

and plotting the resulting images, we get the following results.

The image after preprocessing

The output for the original image look like this -

GALACTIC BASIC
[AUREBESH]

RE HFVMEVEIiZwoyv Ze
ABC DE F GH I JK LM
N—0- PQ RST Uv WX
2 | Ff 8 G& Pf fF § 5 op 7
ee
5, jf FF Ty ee ee
=
334 477 OED

Here's what the output for different preprocessed images looks like -

Canny edge image [not so good]-

CAE Cn Cae AS
[AUREBESE]

EA Na
oe SS
[Ne CI [ENE
a, ee oe ea
2
a a A: rc
|, |
a
Sear eo/e

ecm emclomt Cia cuoomct mi im

Thresholded image -

GALACTIC BASIC
[AVREBESH]
RS 7FVMeEVEi1iFf o£
A B C D EF GH IJ K LM
AOoder7Nnvroroava
N O P Q@R S$ TU VW XK Y¥ Z
7 ee For 8 Ro Pf F Boao om #
0 12 3 4 5 6 7 8 9 , . !
>» 1kr7 @ by FEN
2? S$ [ Por Foy of ee
ASGSANDIE
CH AE EO KH NG OO SH TH

Opening image -

GALACTIC BASIC
[AUREZEBELSH]
KEE VTMEUOU EB iw oN es
A BC D EF F @ H | J K LT Ww
AOGdrcrT7WTt HYOAVa4
WO P Q R BS T U VW WK y Z
i J
Oo 1 2 3 46 8 7 SC Ps,
VY ir- -rp,ptUuY?
a a a
AGoOAnNnoOID
CH AE BO KH ®@ OO SH TH

Getting boxes around text

Using Pytesseract, you can get the bounding box information for your OCR results using the following code.

The script below will give you bounding box information for each character detected by tesseract during OCR.

import cv2
import pytesseract

img = cv2.imread['image.jpg']

h, w, c = img.shape
boxes = pytesseract.image_to_boxes[img] 
for b in boxes.splitlines[]:
    b = b.split[' ']
    img = cv2.rectangle[img, [int[b[1]], h - int[b[2]]], [int[b[3]], h - int[b[4]]], [0, 255, 0], 2]

cv2.imshow['img', img]
cv2.waitKey[0]

If you want boxes around words instead of characters, the function image_to_data will come in handy. You can use the image_to_data function with output type specified with pytesseract Output.

Have an OCR problem in mind? Want to digitize invoices, PDFs or number plates? Head over to Nanonets and build free online OCR models for free!

We will use the sample invoice image above to test out our tesseract outputs.

import cv2
import pytesseract
from pytesseract import Output

img = cv2.imread['invoice-sample.jpg']

d = pytesseract.image_to_data[img, output_type=Output.DICT]
print[d.keys[]]

This should give you the following output -
dict_keys[['level', 'page_num', 'block_num', 'par_num', 'line_num', 'word_num', 'left', 'top', 'width', 'height', 'conf', 'text']]

Using this dictionary, we can get each word detected, their bounding box information, the text in them and the confidence scores for each.

You can plot the boxes by using the code below -

n_boxes = len[d['text']]
for i in range[n_boxes]:
    if int[d['conf'][i]] > 60:
        [x, y, w, h] = [d['left'][i], d['top'][i], d['width'][i], d['height'][i]]
        img = cv2.rectangle[img, [x, y], [x + w, y + h], [0, 255, 0], 2]

cv2.imshow['img', img]
cv2.waitKey[0]

Here's what this would look like for the image of a sample invoice.

Text template matching

Take the example of trying to find where a date is in an image. Here our template will be a regular expression pattern that we will match with our OCR results to find the appropriate bounding boxes. We will use the regex module and the image_to_data function for this.

import re
import cv2
import pytesseract
from pytesseract import Output

img = cv2.imread['invoice-sample.jpg']
d = pytesseract.image_to_data[img, output_type=Output.DICT]
keys = list[d.keys[]]

date_pattern = '^[0[1-9]|[12][0-9]|3[01]]/[0[1-9]|1[012]]/[19|20]\d\d$'

n_boxes = len[d['text']]
for i in range[n_boxes]:
    if int[d['conf'][i]] > 60:
    	if re.match[date_pattern, d['text'][i]]:
	        [x, y, w, h] = [d['left'][i], d['top'][i], d['width'][i], d['height'][i]]
	        img = cv2.rectangle[img, [x, y], [x + w, y + h], [0, 255, 0], 2]

cv2.imshow['img', img]
cv2.waitKey[0]

As expected, we get one box around the invoice date in the image.

Page segmentation modes

There are several ways a page of text can be analysed. The tesseract api provides several page segmentation modes if you want to run OCR on only a small region or in different orientations, etc.

Here's a list of the supported page segmentation modes by tesseract -

0    Orientation and script detection [OSD] only.
1    Automatic page segmentation with OSD.
2    Automatic page segmentation, but no OSD, or OCR.
3    Fully automatic page segmentation, but no OSD. [Default]
4    Assume a single column of text of variable sizes.
5    Assume a single uniform block of vertically aligned text.
6    Assume a single uniform block of text.
7    Treat the image as a single text line.
8    Treat the image as a single word.
9    Treat the image as a single word in a circle.
10    Treat the image as a single character.
11    Sparse text. Find as much text as possible in no particular order.
12    Sparse text with OSD.
13    Raw line. Treat the image as a single text line, bypassing hacks that are Tesseract-specific.

To change your page segmentation mode, change the --psm argument in your custom config string to any of the above mentioned mode codes.

Detect orientation and script

You can detect the orientation of text in your image and also the script in which it is written. The following image -

after running through the following code -

osd = pytesseract.image_to_osd[img]
angle = re.search['[?


				
					

                 
	Bài Viết Liên Quan
	
	 	
		
		   
		   
		   
		
		
			Hướng dẫn python generator geeksforgeeks

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn thư viện phpmailer

		
	

		
		
		   
		   
		   
		
		
			How can we insert data from one table to another table in php?

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn dash-bootstrap themes

		
	

		
		
		   
		   
		   
		
		
			Install phpmyadmin for php 7.4 ubuntu

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn php number format

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn python function return

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn dùng array.isarray trong PHP

		
	

		
		
		   
		   
		   
		
		
			Xổ số đồng tháp ngày 15 tháng 3 năm 2023

		
	

		
		
		   
		   
		   
		
		
			In ma trận trong python

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn remove html from url

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn dùng re.split trong PHP

		
	

		
		
		   
		   
		   
		
		
			What is class object and attribute in python?

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn dùng php creator trong PHP

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn dùng incr trong PHP

		
	

		
		
		   
		   
		   
		
		
			What does \n do in html?

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn php input_server

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn php imagick add watermark

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn dùng python byte python

		
	

		
		
		   
		   
		   
		
		
			How does php store passwords in encrypted format?

		
	

	
	




Toplist mới

 
	
	 
		#1
		
			Top 7 tết mậu thân năm 1968 đã diễn ra sự kiện gì ở miền nam nước ta 2023
			5 tháng trước
		
	



	
	 
		#2
		
			Top 13 luyện từ và câu: dấu gạch ngang lớp 4 trang 45 2023
			5 tháng trước
		
	



	
	 
		#3
		
			Top 6 trong mặt phẳng oxy ảnh của đường thẳng d 3x y 4=0 2023
			5 tháng trước
		
	



	
	 
		#4
		
			Top 6 thử thách thần chết thuyết minh phần 2 2023
			5 tháng trước
		
	



	
	 
		#5
		
			Top 4 vở bài tập tiếng việt lớp 3 tập 2 chính tả trang 15 2023
			5 tháng trước
		
	



	
	 
		#6
		
			Top 5 áo khoác nam quảng châu cao cấp 2023
			5 tháng trước
		
	



	
	 
		#7
		
			Top 4 nội dung nào sau đây không phải là trách nhiệm của đơn vị đầu mối cung cấp thông tin 2023
			5 tháng trước
		
	



	
	 
		#8
		
			Top 9 mẫu đồng phục công sở đẹp 2022 2023
			5 tháng trước
		
	



	
	 
		#9
		
			Top 5 ốp lưng iphone 13 pro bảo vệ camera 2023
			5 tháng trước
		
	






		


	Bài mới nhất
	
	 	
		
		   
		   
		   
		
		
			Lỗi khi cài win 7 require cd từ usb năm 2024

		
	

		
		
		   
		   
		   
		
		
			31 10 11 đến chơi nào eunhyuk donghae full năm 2024

		
	

		
		
		   
		   
		   
		
		
			Lỗi alt click to define trên photoshop có nghĩa gì năm 2024

		
	

		
		
		   
		   
		   
		
		
			Trưởng ban dân vận tỉnh ủy tiếng anh là gì năm 2024

		
	

		
		
		   
		   
		   
		
		
			Bài tập trắc nghiệm amino axit có đáp án năm 2024

		
	

		
		
		   
		   
		   
		
		
			Bài tập powerpoint cơ bản cho tiểu học năm 2024

		
	

		
		
		   
		   
		   
		
		
			Đi xe grab 4 chỗ trung bình giá bao nhiêu năm 2024

		
	

		
		
		   
		   
		   
		
		
			Giải vở bài tập toán lớp 5 bài 168 năm 2024

		
	

		
		
		   
		   
		   
		
		
			Tập làm văn kiểm tra viết trang 45 năm 2024

		
	

		
		
		   
		   
		   
		
		
			New years eve đi với giới từ nào năm 2024

		
	

	
	
                 
	Chủ Đề
	
	
	
		  programming
		  Hỏi Đáp
		  Là gì
		  Mẹo Hay
		  Toplist
		  Địa Điểm Hay
		  Học Tốt
		  mẹo hay
		  Công Nghệ
		  Nghĩa của từ
		  Bao nhiêu
		  Khỏe Đẹp
		  đánh giá
		  Tiếng anh
		  Top List
		  bao nhieu
		  bao nhiêu
		  hướng dẫn
		  Món Ngon
		  So Sánh
		  So sánh
		  Bài tập
		  Xây Đựng
		  Sản phẩm tốt
		  Ngôn ngữ
		  Bài Tập
		  javascript
		  Ở đâu
		  Thế nào
		  Hướng dẫn
		  Dịch 
		  Tại sao
		  Máy tính
		  Đại học