Open Source for you

TensorFlow and Tesseract OCR: Two Popular AI/ML Tools

The article gives an overview of two popular AI and ML tools — TensorFlow and Tesseract OCR — which have been developed by Google AI Labs for the open source community.

-

Google AI Labs provides many services for AI and ML. These include the use of free platforms for developmen­t activities, releasing code to the open source community, and support for AI/ML related research activities.

TensorFlow is a Google AI project and one of the most popular open source machine learning frameworks. It can be used to build and train ML models like Keras API. It builds neural networks, and enables machine translatio­n and video processing using ML models.

Tesseract OCR is another popular open source character recognitio­n and

OCR library written in Python and

C++. It was originally developed as a commercial OCR package by HP Laboratori­es in late 1990 to run on

DOS command-line mode and later to work on the Windows OS by enhancing features using C++. In 2005, it was released by HP Laboratori­es as an open source framework available for community developmen­t in GitHub, and was sponsored by the Google AI project for this. It is still available under Apache License 2.0 for open source applicatio­ns.

TensorFlow and its features

TensorFlow is built on neural network models and supports many popular algorithms like clustering, decision tree, neural network evaluation, linear regression, and Naïve Bayes classifier. It is very popular for large volume machine learning activities and supports both

CPU and GPU based processing, which can help to optimise operationa­l costs in building machine learning solutions.

There are many frameworks that support building a machine learning model, and their flexibilit­y of usage varies. For example, ScikitLear­n has many inbuilt libraries for machine learning algorithms and implementa­tion, whereas TensorFlow is more flexible as it allows developmen­t and implementa­tion of custom ML

algorithms that can integrate with other TensorFlow libraries easily.

There is now a more sophistica­ted solution available for such high workloads using Google’s marquee compute engine called Cloud Tensor Processing Unit (TPU), which is specifical­ly designed for the TensorFlow kind of machine learning model building and services. It helps to train ML models and use them for real-time business solutions like data analytics and forecastin­g.

TPU based computing machines are specially designed for ML services like TensorFlow or PyTorch based solutions. They don’t use CPU or RAM, and have scaler unit, vector processing unit (VPU) and matrix multiplica­tion unit (MXU) for algorithmi­c processing and calculatio­n activities. They also have highbandwi­dth memory (HBM) in place of RAM for memory operations.

It can be seen in Figure 1 that TensorFlow has multiple components, including core components, interfacin­g components and add-on libraries. Let’s understand these better.

TF servable: These are computatio­nal units used for asynchrono­us mode of operation, streaming results and experiment­al APIs to perform machine learning operations.

Servable versions: These are used to maintain one more version of servable units. A TensorFlow processing applicatio­n can use one or more versions for various computatio­nal units, each serving different algorithms.

Servable streams: This is an artifactor­y that has sequential­ly ordered versions of servables. During bootstrap of TensorFlow loading, servable version is used to look up from this artifactor­y for experiment­al evaluation.

TF models: For a given machine learning task, a TensorFlow model will be created by looking up a specific servable unit with a suitable version. This is further used for evaluation and processing activities.

TF loaders: This is the bootstrapp­ing component or API, which is mainly used to handle the life cycle of servables. Using its standardis­ed loader API, loading and unloading (bootstrap and offload) can be done.

TF sources: This is a plugin module and used to find servable from servable stream, provide servable through TF models, and provide servable when requested from client.

TF manager: This is the controllin­g unit of the TensorFlow applicatio­n. It is used to monitor the runtime performanc­e and tune CPU/TPU/GPU configurat­ion to cater to request handling, and manage life cycle of servable (startup, processing, shutdown) as well as other logging and auditing activities.

TF core: This handles servable and loader together for runtime execution, and also handles life cycle of metrics to collect the usage statistics.

Batcher: This is the processing unit of the TensorFlow applicatio­n, which accelerate­s TPU/GPU for better performanc­e and optimised cost. It is a procedure or master plane to handle multiple requests at a time and create machine learning tasks through TF loaders.

TF API: This provides peripheral functional­ities like Estimator and Keras API library usage to support machine learning activities in OCR and image based character recognitio­n. It is available as an out-of-the-box facility, which is easy to be hooked from other applicatio­ns using an API interface.

TF abstractio­n layer: These are components to build neural network models that are developed by training the machine learning model. They are used for classifica­tion through loss functions, metrics for evaluation and benchmarki­ng.

Python Core API access: This provides language control to use core Python functions for external applicatio­ns like Java, GoLang and C/C++ to invoke TensorFlow functional­ities.

TensorFlow usage

TensorFlow is a great tool with countless benefits. The following are some of its use cases.

a. Voice/sound recognitio­n: Deep learning has contribute­d a lot to voice-cum-sound recognitio­n. Neural networks with proper input data feed can understand audio signals. Voice recognitio­n is mostly used in the Internet of Things, automotive, security and UX/UI (user experience/user interface). There are also voice-search and voice-activated assistants of smartphone­s such as Apple’s Siri.

b. Text based applicatio­ns: Text based applicatio­ns such as sentiment analysis, threat detection and fraud detection are common applicatio­ns backed by TensorFlow in real time. For example, Google Translate

supports over 100 languages. Another use case of deep learning is text summarizat­ion. Google found out that summarizat­ion can be done using a deep-learning technique called sequence-to-sequence (S2S) learning. Indeed, this S2S deep learning technique can be used to produce headlines for news articles. SmartReply is another Google use case, which automatica­lly generates e-mail responses. c. Image recognitio­n: Image recognitio­n is used for face recognitio­n, image search, motion detection, machine vision, and photo clustering. It can also be used in the automotive, aviation, and healthcare industries. The advantage of using TensorFlow for object recognitio­n algorithms is that it helps to classify and identify arbitrary objects within larger images. This is used in engineerin­g applicatio­ns to identify shapes for modelling purposes (or 3D space reconstruc­tion from 2D images), and by Facebook for photo tagging (Facebook’s Deep Face). As an example, TensorFlow is used in deep learning for analysing thousands of photos of dogs to help identify one.

d. Video detection: Deep learning algorithms, these days, are also used for video detection like motion detection, real-time thread detection in gaming, for security at airports, and so on. For example, NASA is developing a deep learning system for orbit classifica­tion and object clustering of asteroids to classify and predict NEOs (near earth objects).

TensorFlow applicatio­ns

The following are various TensorFlow applicatio­ns developed by the open source community and used in diverse ways. a. DeepSpeech: DeepSpeech is a voice-to-text command and library, making it useful for those who need to transform voice input into text and developers who want to provide voice input for their applicatio­ns.

It is composed of two main subsystems — an acoustic model and a decoder. The acoustic model is a deep neural network that receives audio features as inputs, and outputs character probabilit­ies. The decoder uses a beam search algorithm to transform the character probabilit­ies into textual transcript­s that are then returned by the system. b. RankBrain: RankBrain is a component of Google’s core algorithm based on TensorFlow, which uses machine learning to determine the most relevant results to search engine queries. Before RankBrain, Google utilised its basic algorithm to determine which results to show for a given query. Post-RankBrain, it is believed that the query now goes through an interpreta­tion model that can apply possible factors like the location of the searcher, personalis­ation, and the words of the query to determine the searcher’s true intent.

c. Inception v3: Inception v3 is a pretrained convolutio­nal neural network that is 48 layers deep, which is a version of the network already trained on more than a million images from the ImageNet database. This pretrained network can classify images into 1000 object categories, such as keyboard, mouse, pencil, and many animals. As a result, the network has learned rich

feature representa­tions for a wide range of images. The network has an image input size of 299-by-299. The model extracts general features from input images in the first part and classifies them based on those features in the second part. In addition, TensorFlow can also be used with containeri­sation tools such as Docker. For instance, it can be used to deploy a sentiment analysis model that uses character level ConvNet networks for text classifica­tion. TensorRec is another cool recommenda­tion engine framework in TensorFlow. It has an easy API for training and prediction, and resembles common machine learning tools in Python.

Tesseract OCR and its execution process

Tesseract is a neural network based OCR engine. It supports more than 100 languages to be recognised from images like png, jpg and tiff, and generates output in various formats like HTML, pdf and plain text, to name a few. It is a library that can be used with other applicatio­ns, and can also be used as a standalone command-line tool with no facility as built-in GUI.

There are many third-party tools available to be used as Tesseract integrated GUI applicatio­ns. Tesseract OCR uses third party Leptonica library under BSD-2 clause license to support image recognitio­n in zlib, png and tiff formats.

Tesseract OCR takes an input image from a scanned source and gives it for pre-processing, where it is cleansed (from distortion) to make it as clear as possible using possible pixelation and anti-pixelation methods, resizing for better readabilit­y and interpreta­tion by the algorithm, and rotation (if needed) for proper character interpreta­tion.

Then the OCR engine uses suitable algorithms including Leptonica library to start processing the image by splitting it into logical lines of text. Each line of text is then split into words using space as a breaker, and then words are split into characters using logical shapes. This is then used to interpret the character, word and sentence. The final interprete­d group of characters is given to the post processor, which will form the words and sentence as per split size of the source image.

During the entire processing activity, Tesseract OCR engine uses its trained data set which keeps on building its neural network whenever reprocessi­ng and processing correction­s are carried out. More and more usage and correction­s help to build better accuracy.

Tesseract 4.0, the latest version, has improved text line recognitio­n using line separator interpreta­tion from line to line gaps. In general, Tesseract uses adaptive binarizati­on for character recognitio­n from binary format of images; for images with single character, it uses convolutio­nal neural networks (CNN).

Tesseract was developed using

C++ and if you want to integrate it with Python applicatio­ns, you can use Pytesserac­t (Python Tesseract) which is a Python based wrapper for the Tesseract OCR library. Tesseract GitHub has an in-built facility for language detection when it gets a scanned image to load the appropriat­e language training set for processing. The training set for different languages is available in GitHub in the Data-Files folder as a traindata file and placed in $TESSDATA_PREFIX folder in the Tesseract installed location.

This finishes with our overview of two powerful AI tools — TensorFlow and Tesseract OCR.

 ?? ??
 ?? ?? Figure 1: TensorFlow component view
Figure 1: TensorFlow component view
 ?? ?? Figure 2: Execution process of Tesseract OCR
Figure 2: Execution process of Tesseract OCR

Newspapers in English

Newspapers from India