Convert documents into editable text
Nick Peers reveals how to make use of Google Docs to extract text from scanned images
Certificates and other documents contain a wealth of information about your forebears, but transcribing all of that detail by hand can take a painfully long time. This is where optical character recognition comes into its own. OCR enables computers to convert scans of images into editable text files. It does a great job converting typewritten documents, and is increasingly effective with handwritten documents too, although results can still be patchy.
OCR products usually come with a price tag, but thanks to Google’s free tool Docs, you can convert your scanned documents into editable text for free. When you pass your scan through Docs’ OCR engine, it delivers a document with both the image and a transcription beneath it.
This document can then be edited in your browser, or downloaded as a file you can open in a word processor such as Microsoft Word and LibreOffice Writer.
OCR relies on good, clear images, but we’ll show you how to optimise your scans before you submit them to Docs, to boost your chances of delivering a readable transcription you can edit to deliver a word-perfect translation of the text.
1
Scan Document If necessary, scan your original document or certificate into your computer using your printer or camera. If scanning, make sure that the image is 300 dpi resolution and either black and white or greyscale, so that the text – whether typed or handwritten – can be read clearly by Google’s OCR engine.
2
Download A Record If the document is held on a website like ancestry.co.uk or findmypast.co.uk, locate the record and open the image in the site’s image viewer. Look for a download option – click this and save the image to your computer. This should be a sufficiently high resolution to work with Google’s OCR engine.
3
Prepare The Image Next, open your scanned or downloaded image in an image editor like the free Paint.NET ( getpaint.net). If necessary, convert it to greyscale (choose ‘Adjustments > Black and White’ in Paint.NET). Crop out any unnecessary detail so that only the text – typed or handwritten – remains.
4
Improve Contrast Levels If the document is a little murky, see if you can adjust its brightness and contrast. Start with Paint. NET’s ‘Adjustments > Brightness/Contrast’ – try pushing both sliders up to make the background as light as possible while keeping the text as black and as sharp as you can.
5
Other Tweaks More experienced users may get better results fixing brightness issues with the ‘Adjustments > Levels’ tool. If your text is slightly out of focus, look for a tool to subtly sharpen the pixels (‘Effects > Sharpen’ in Paint.NET) to make the characters more defined and therefore easier to read.
6
Upload To Google Drive If Google Backup and Sync is installed on your computer, copy the file into one of your Google Drive folders – it should upload automatically. Otherwise navigate to drive.google.com in your browser, log into your Google account, locate the correct folder, then right-click and choose ‘Upload file’. 7
Perform OCR Go to drive.google.com in your web browser and locate the file you’ve uploaded, then right-click it as shown here and choose ‘Open with > Google Docs’. A new browser tab will open. Wait while the file is converted, then the new document will appear in Google Docs.
8
Review Results Your original image will be displayed at the top of the new document, so scroll down and Google Docs’ OCR engine will display the text as it transcribed it. Bear in mind that if the initial results are poor, you may want to re-edit the original scanned document in your image editor to try to improve the quality.
9
Download Back To Your Computer As the transcription is fully editable, you can correct any errors manually as well as re-styling the text to suit your purposes. When you’ve finished all of your edits, choose ‘File > Download’ and your output format (Word, LibreOffice and PDF are supported) to save a copy to your computer.