GI­mageReader

Ver­sion: Web: https://github.com/man­isan­dro

Linux Format - - LXFHOTPICKS -

Ac­count­ing, for­mal of­fice pa­per­work, li­brary ser­vices and, of course, main­tain­ing your own dig­i­tal ar­chive of his­toric doc­u­ments and pub­li­ca­tions – this is just a short list of ap­pli­ca­tions where op­ti­cal char­ac­ter recog­ni­tion (OCR) is wel­come. The idea of ex­tract­ing text from a scanned bit­map im­age be­came pop­u­lar with the rise of home flatbed scan­ners in the 1990s (an­cient times in com­put­ing terms), par­tic­u­larly thanks to the com­mer­cial Ab­byy Finereader soft­ware. In Linux, we have an ana­logue to Finereader, known as Tesser­act. This is a com­mu­nity ef­fort to bring pro­fes­sional-qual­ity OCR to Linux, and we must ad­mit, it works just fine. The hero of this re­view is a graph­i­cal fron­tend to Tesser­act, which al­lows ev­ery­one to scan and ex­tract text data from any pa­per doc­u­ment. gI­mageReader is a sleek and easyto-use ap­pli­ca­tion that en­ables you to es­cape hav­ing to deal with Tesser­act via the com­mand line. Don’t get con­fused by that ini­tial ‘g’ – it sim­ply

“A com­mu­nity ef­fort to bring pro­fes­sion­alqual­ity OCR to Linux”

means ‘graph­i­cal’, and depend­ing on your desk­top of choice, you may want to use ei­ther the GTK3 or Qt5 ver­sion of gI­mageReader, which are both sup­ported of­fi­cially.

The ap­pli­ca­tion doesn’t have too many con­trols and con­fig­urables, thus is quite friendly to new­com­ers. You can import bit­map files or scan di­rectly from gI­mageReader, if you have a phys­i­cal scan­ning de­vice. Re­mark­ably, gI­mageReader dis­tin­guishes real scan­ners from the list of avail­able V4L de­vices – so, un­like many other mul­ti­me­dia apps in Linux, this one ig­nores your we­b­cam and shows only gen­uine scan­ners.

In or­der for the recog­ni­tion en­gine to work cor­rectly for your lan­guage, you must make sure you’ve in­stalled the ap­pro­pri­ate lan­guage pack­ages for Tesser­act, oth­er­wise gI­mageReader pro­duces iffy re­sults. Luck­ily, Tesser­act sup­ports over 100 lan­guages and writ­ing sys­tems, so you just need to check your pack­age man­ager and install the re­quired parts.

The re­sults are ed­itable text that you can copy and paste to any other ap­pli­ca­tion, such as Li­breOf­ficeWriter, Scribus and so on.

Check pre­de­fined lan­guage def­i­ni­tions to make sure that gI­mageReader will work cor­rectly.

Newspapers in English

Newspapers from Australia

© PressReader. All rights reserved.