Optical Character Recognition (OCR)
Text Recognition with Tesseract
RASPBERRY PI COURSE GUIDE
thingsRoam Academy Contact: +92-308-1222240 academy.thingsroam.com
Email: academy@thingsroam.com
Optical Character Recognition (OCR)
Text Recognition with Tesseract
Optical Character Recognition (OCR):
OCR = Optical Character Recognition. In other words, OCR systems transform a two-
dimensional image of text, that could contain machine printed or handwritten text from
its image representation into machine-readable text. OCR as a process generally
consists of several sub-processes to perform as accurately as possible. The sub-
processes are:
Preprocessing of the Image
Text Localization
Character Segmentation
Character Recognition
Post Processing
What is Tesseract OCR?
Tesseract is an open source text recognition (OCR) Engine, available under the Apache
2.0 license. It can be used directly, or (for programmers) using an API to extract printed
text from images. It supports a wide variety of languages.
It can be used with the existing layout analysis to recognize text within a large
document, or it can be used in conjunction with an external text detector to recognize
text from an image of a single text line.
Tesseract 4.00 includes a new neural network subsystem configured as a text line
recognizer. To recognize an image containing a single character, we typically use a
Convolutional Neural Network (CNN). Text of arbitrary length is a sequence of
characters, and such problems are solved using RNNs and LSTM is a popular form of
RNN.
thingsRoam Academy Contact: +92-308-1222240 academy.thingsroam.com
Email: academy@thingsroam.com
Optical Character Recognition (OCR)
Text Recognition with Tesseract
Legacy Tesseract 3.x was dependant on the multi-stage process where we can
differentiate steps:
Word finding
Line finding
Character classification
To install tesseract in laptop use the following commands in Anaconda Command
Prompt, make sure you are in the same environment in which OpenCV is installed.
conda install -c conda-forge tesseract
-c conda-forge pytesseract
thingsRoam Academy Contact: +92-308-1222240 academy.thingsroam.com
Email: academy@thingsroam.com
Optical Character Recognition (OCR)
Text Recognition with Tesseract
To install tesseract in Raspberry Pi, type the following commands in CLI of Raspberry
Pi, make sure you are in the same environment in which OpenCV is installed.
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev
sudo pip install pytesseract
To check Tesseract's installation, type the following command in the terminal:
tesseract –version
Code for Text Recognition from a Saved Picture:
import pytesseract
from PIL import Image
import cv2
img = cv2.imread('para.jpg',cv2.IMREAD_COLOR)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) #convert
to grey to reduce detials
gray = cv2.bilateralFilter(gray, 11, 17, 17)
original = pytesseract.image_to_string(gray, config='')
print (original)
Before running the above code, make sure that you have saved an image with jpg
extension named as para in your root folder. As in line 5 of the code ‘para.jpg’ is being
read.
thingsRoam Academy Contact: +92-308-1222240 academy.thingsroam.com
Email: academy@thingsroam.com
Optical Character Recognition (OCR)
Text Recognition with Tesseract
If we want to convert our recognized text into speech then we are required to use a text-
to-speech converter. For that we can install pyttsx3 through the following command:
1. Go to Anaconda prompt and type conda install pip . This will install pip in the
current conda environment.
2. After step 1, type pip install pyttsx3.
To check the installation, run the below code in your Jupyter Notebook and you will hear
a voice saying ‘I will speak this text’
import pyttsx3
engine = pyttsx3.init()
engine.say("I will speak this text")
engine.runAndWait()
Now by adding few extra lines of code we can convert our recognized text into speech.
Hence applying OCR + TTS Technique.
import pytesseract
from PIL import Image
import cv2
import pyttsx3;
engine = pyttsx3.init();
img = cv2.imread('para.jpg',cv2.IMREAD_COLOR)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) #convert
to grey to reduce detials
gray = cv2.bilateralFilter(gray, 11, 17, 17)
original = pytesseract.image_to_string(gray, config='')
print (original)
engine.say(original);
engine.runAndWait() ;
thingsRoam Academy Contact: +92-308-1222240 academy.thingsroam.com
Email: academy@thingsroam.com
Optical Character Recognition (OCR)
Text Recognition with Tesseract
You can give three important flags for tesseract to work and these are -l , --oem , --psm.
The -l flag controls the language of the input text.
The --oem argument, or OCR Engine Mode, controls the type of algorithm used by
Tesseract.
The --psm controls the automatic Page Segmentation Mode used by Tesseract.
It can be used like this with .image_to_string method of tesseract (used in 2nd last
line of 1st code):
config = ("-l eng --oem 1 --psm 7")
original = pytesseract.image_to_string(gray, config="-l eng --
oem 1 --psm 7")
By default, Tesseract expects a page of text when it segments an image. If you're just
seeking to OCR a small region, try a different segmentation mode, using the --psm
argument. There are 14 modes available which can be found here. By default,
Tesseract fully automates the page segmentation but does not perform orientation and
script detection.
PSM – Page Segmentation Mode
OEM (type of algorithm used by Tesseract)
thingsRoam Academy Contact: +92-308-1222240 academy.thingsroam.com
Email: academy@thingsroam.com
Optical Character Recognition (OCR)
Text Recognition with Tesseract
There is also one more important argument, OCR engine mode (oem). Tesseract 4 has
two OCR engines — Legacy Tesseract engine and LSTM engine. There are four modes
of operation chosen using the --oem option.
OEM Mode:
0 Legacy engine only.
1 Neural nets LSTM engine only.
2 Legacy + LSTM engines.
3 Default, based on what is available.
Page segmentation modes
There are several ways a page of text can be analysed. The tesseract api provides
several page segmentation modes if you want to run OCR on only a small region or in
different orientations, etc.
Here's a list of the supported page segmentation modes by tesseract -
0 Orientation and script detection (OSD) only.
1 Automatic page segmentation with OSD.
2 Automatic page segmentation, but no OSD, or OCR.
3 Fully automatic page segmentation, but no OSD. (Default)
4 Assume a single column of text of variable sizes.
5 Assume a single uniform block of vertically aligned text.
6 Assume a single uniform block of text.
7 Treat the image as a single text line.
8 Treat the image as a single word.
9 Treat the image as a single word in a circle.
10 Treat the image as a single character.
11 Sparse text. Find as much text as possible in no particular order.
12 Sparse text with OSD.
13 Raw line. Treat the image as a single text line, bypassing hacks that are
Tesseract-specific.
To change your page segmentation mode, change the --psm argument in your custom
config string to any of the above mentioned mode codes.
thingsRoam Academy Contact: +92-308-1222240 academy.thingsroam.com
Email: academy@thingsroam.com
Optical Character Recognition (OCR)
Text Recognition with Tesseract
Code for Text Recognition with Raspberry Pi Camera:
import cv2
import pytesseract
from picamera.array import PiRGBArray
from picamera import PiCamera
camera = PiCamera()
camera.resolution = (640, 480)
camera.framerate = 30
rawCapture = PiRGBArray(camera, size=(640, 480))
for frame in camera.capture_continuous(rawCapture,
format="bgr", use_video_port=True):
image = frame.array
cv2.imshow("Frame", image)
key = cv2.waitKey(1) & 0xFF
rawCapture.truncate(0)
if key == ord("s"):
text = pytesseract.image_to_string(image)
print(text)
cv2.imshow("Frame", image)
cv2.waitKey(0)
break
cv2.destroyAllWindows()
thingsRoam Academy Contact: +92-308-1222240 academy.thingsroam.com
Email: academy@thingsroam.com