APC Australia

Working with poppler-utils


While we’ve discussed a number of different utilities to manipulate PDF files in different ways, your Linux distributi­on has still more tools to offer, for still more operations.

The poppler-utils package is a collection of tools built on Poppler’s library API. Each of these tools is designed to work with PDF files and even extract content from them. You can install the poppler-utils package by running the sudo apt install poppler-utils command if you’re on a Debian, Ubuntu or derivative distributi­on. Run the sudo dnf install poppler-utils command if you’re on a RPM-based distro such as Fedora.

The package includes many different utilities such as pdftohtml that can be used to convert a PDF file into HTML while retaining the formatting of the original PDF file. The command pdftohtml filename.pdf output.html will first break the pages of the specified PDF file into individual JPEG images and then create an index file called output_ind. html as well as output.html.

You can similarly use the pdfseparat­e utility to split a PDF file into smaller sections. It works much like the pdftk which we’ve already discussed, and also supports the use of page ranges. Refer to its man page for more details. The pdfunite utility provides the opposite functional­ity, and can be used to stitch multiple PDFs together into a single file.

The bottom of the man page for these commands lists the various other utilities that are part of the poppler-utils package such as pdfimages, pdfinfo and pdffonts.

Newspapers in English

Newspapers from Australia