Version: Web: https://github.com/manisandro
Accounting, formal office paperwork, library services and, of course, maintaining your own digital archive of historic documents and publications – this is just a short list of applications where optical character recognition (OCR) is welcome. The idea of extracting text from a scanned bitmap image became popular with the rise of home flatbed scanners in the 1990s (ancient times in computing terms), particularly thanks to the commercial Abbyy Finereader software. In Linux, we have an analogue to Finereader, known as Tesseract. This is a community effort to bring professional-quality OCR to Linux, and we must admit, it works just fine. The hero of this review is a graphical frontend to Tesseract, which allows everyone to scan and extract text data from any paper document. gImageReader is a sleek and easyto-use application that enables you to escape having to deal with Tesseract via the command line. Don’t get confused by that initial ‘g’ – it simply
“A community effort to bring professionalquality OCR to Linux”
means ‘graphical’, and depending on your desktop of choice, you may want to use either the GTK3 or Qt5 version of gImageReader, which are both supported officially.
The application doesn’t have too many controls and configurables, thus is quite friendly to newcomers. You can import bitmap files or scan directly from gImageReader, if you have a physical scanning device. Remarkably, gImageReader distinguishes real scanners from the list of available V4L devices – so, unlike many other multimedia apps in Linux, this one ignores your webcam and shows only genuine scanners.
In order for the recognition engine to work correctly for your language, you must make sure you’ve installed the appropriate language packages for Tesseract, otherwise gImageReader produces iffy results. Luckily, Tesseract supports over 100 languages and writing systems, so you just need to check your package manager and install the required parts.
The results are editable text that you can copy and paste to any other application, such as LibreOfficeWriter, Scribus and so on.
Check predefined language definitions to make sure that gImageReader will work correctly.