Photos: Tidy up your library
Alexander Tolstoy explores image optimisation, and uncovers new ways to further compress what’s already been squeezed into a small space
Despite the popularity of cloud storage, many Linux users store their image libraries on their hard drives – then wonder how they can optimise the footprint of their files. In this tutorial we tackle a massive photo library that have been incrementally growing for years, an ideal job for server admins who maintain large amounts of user data, or whoever needs to store lots of images.
We’ll show how to get more free space without compromising image quality, and reveal how to browse your new optimised image library with comfort and convenience. We’ll look at ways of removing duplicate files with simple and reliable tools, then we’ll squeeze extra bits out of existing JPEG and PNG files and finally we’ll bravely convert our images to a couple of next-gen file formats.
Before we dive into the exciting world of novel technologies, let’s cite a more brutal, yet very effective way to make your images take up less space. Let’s say you have a bunch of PNG images that you don’t really need to be that big. You can slightly downscale them and recompress using the lossy JPEG mode, and they’ll still look good enough on-screen:
$ mogrify -path . -resize 70x70% -quality 75 -format jpg *.png
This example uses mogrify from the ImageMagick package, which is available across nearly all Linux distributions. You instruct it to resize all PNG files in the current directory to 70 per cent and convert them to JPEG with 75 per cent compression. This approach is basic and it works only if you can afford to lose a little image quality. If this doesn’t appeal, read on for great tips on lossless magic!
Duplicate or similar
Duplicate image files are often the result of copying your photos from removable media to the hard drive, when you try to sort your images manually using date-based directories. Such directories will eventually contain partially overlapping sets of images, which isn’t ideal.
Excessive backups and copies for resized images also contribute to the growing amount of redundant files. Most Linux distros have the fdupes application in their repositories, which is supposed to find identical files, but we often need something more powerful for images.
Image duplicates can have different names and sizes, and we may also want to detect non-identical, but very similar images, like edited ones, or those taken with a continuous series of shots. Obviously, we could come up with some neural network for solving this problem, but that’s akin to using a sledgehammer to crack a hazelnut.
There’s a much more lightweight – if oddly named – solution at www.github.com/opennota/findimagedupes
called Findimagedupes. This is a set of Go scripts that analyse the contents of the given directory and put images’ fingerprints inside its own database. Findimagedupes studies similarity factors and presents a list of identical or visually similar sets of images that you may want to weed out. You can adjust the threshold of similarity, skip checking for identical images, and use recursive search with
Findimagedupes. Here’s an example:
$ ./findimagedupes -R ~/Pictures
This command will search inside ~/Pictures including all subdirectories. The output is a plain list where each line consists of full paths to images divided by spaces. To adjust similarity use the -t parameter followed by the amount from 0 to 63, where 0 means that Findimagedupes will detect only identical images and 63 means it will treat all images as similar. In the following example, we’ve used a sensible and realistic amount between the two extremities:
$ ./findimagedupes -R -t 30 ~/Pictures
Right away, it’s not evident what to do next, but luckily Findimagedupes enables you to open each set of images using an external application, like this:
$ ./findimagedupes -R -t 30 -p feh ~/Pictures
We used Feh in this example, because it’s the most uncomplicated tool around for viewing images, but the choice of viewer is up to you. This non-automated method will give the most accurate results, as long as you’re happy to delete duplicates after careful inspection.
There’s also another way to estimate image similarity, which is based on computing a scalar score for each pair. We’ll be using www.github.com/google/butteraugli
Butteraugli, a simple and user-friendly software for measuring perceived differences between images bases on a scientific approach. Butteraugli is easy to compile (just run $ make ) and offers a simple command-line syntax:
$ ./butteraugli file1 file2
Butteraugli accepts both JPEG and PNG files and requires that both files in a pair have equal dimensions in pixels. The output is a score of similarity, where 0 means that both images are identical, while any other positive value reflects the amount of difference. The great extra feature of
Butteraugli is its ability to draw a ‘heat map’ of differences between images. All you need to do is specify the output file:
$ ./butteraugli file1 file2 heat.ppm
The resulting PPM file will show you areas that have differences between your images. The practical purpose of
Butteraugli is that it helps detect changes that are barely visible to the naked eye. So, while Findimagedupes helps you detect similar images, Butteraugli will be more specific in pointing out what exactly differs.
We’ll now assume that all images that survived our Great Purge are precious, but we still want to reduce their data footprint. Google’s engineers have created a tool that slims down JPEG files without compromising on quality. Guetzli claims a 20-30 per cent improvement in reduced file sizes over the usual way that you write JPEG in Linux using libjpeg. According to the Guetzli developers, it strikes a balance between minimal loss and file size by employing a search algorithm that tries to overcome the difference between the psychovisual modeling of JPEG’s format and Guetzli’s psychovisual model. Guetzli works at a leisurely pace. According to Google’s documentation, it takes about one minute to encode one megapixel of bitmap data. This means Guetzli will be processing each of your photos painfully slowly, but on the plus side the program delivers the best JPEG file size optimisation for images that were compressed with a ratio between 84 and 100. In other words, Guetzli will be useful in case you need to maintain very high image quality without increasing the amount of JPEG compression. It’s great for a one-time optimisation pass that will free some space on your file server or any other storage that you use.
Grab the code from www.github.com/google/guetzli, ensure you have the libpng and libjpeg development files, and run $ make in the project’s directory. You’ll see your binary in bin/Release, and so it’s time to put Guetzli through its paces: $ ./guetzli --quality 84 in.jpg out.jpg
The ratio of 84 is the best (and the lowest possible) for Guetzli. When tested across a set of images, it became clear
that no other method of compressing JPEGs beats Guetzli with that ratio in terms of file size. Here’s a sample command for batch processing several files at once: $ for file in *.jpg; do guetzli --quality 84 “$file” “${file/%ext/ out}”; done
When working with a single file, you can skip the command line routines by turning to a third-party Guetzli graphical tool that you can get from www.github.com/
till213/GuetzliImageIOPlugin. It offers a shared library and a Qt5 Image-plugin together with a beginner-friendly sample app where you can load and preview your JPEGs, set some options and finally ‘bake’ your Guetzli. Although baking is incredibly CPU-intensive, it’s the only state-of-the-art JPEG optimisation tool out there at the moment. The quality of the resulting images is superb – you won’t be able to tell the difference between your originals and the outputs.
Lepton drops it like it’s hot
We’re moving on with another novel technique for reducing JPEG files. Lepton is an open source encoder from Dropbox. It claims to squeeze an extra 22 per cent out of your regular JPEGs, so we were keen to prove this figure in our tests.
Getting Lepton to run is easy: clone the project’s code from www.github.com/dropbox/lepton and run $ ./autogen.sh && ./configure && make && sudo make install
The syntax of Lepton commands is also straightforward: $ lepton in.jpg out.lep
As you can see, Lepton compresses your file and produces the output in the unknown format. From this moment on you can no longer open, edit or otherwise work with your files untill you uncompress it back: $ lepton in.lep out.jpg
Obviously, because there’s no third-party integration of Lepton in popular image viewers for Linux, it simply acts as an archive manager. This is still useful for certain applications, such as storing massive datasets on a back-up drive, for example. It’s important to note that Lepton delivers lossless encoding, so the original file and the JPEG-LEP-JPEG processed file are identical.
Lepton works very quickly and delivers a better compression ratio than Guetzli, but bear in mind that you’ll be unable to access your files on a system without Lepton, whereas Guetzli maintains perfect backward compatibility for its JPEG files. Its high working speeds makes Lepton ideal for tackling large images.
The easiest way to run a real-world test is to obtain a file from, for example, the Wikimedia public storage ( http://bit.ly/wiki-big-images) and try to encode it with
Lepton. Note that Lepton can only tackle JPEGs that are less than 128MB, so bear this in mind before selecting large files to run through it. Furthermore, you’ll definitely need to pass some extra arguments to Lepton in order to process an image of that size, as follows: $ lepton -memory=4096M -threadmemory=128M in.jpg out. lep
We tried to compress the 0-cynefin-ORIGINEEL.jpg file (93.9MB) and produced the 0-cynefin-ORIGINEEL.lep file, which is only 64MB in size. That’s a 31 per cent reduction! Actual results differ across images, and it depends on what kind of picture you work with. For instance, if the image contains areas with solid filling, line art or drawings, then the encoder will perform much better than in the case of a photograph of a real-world object.
Given the fact that Guetzli and Lepton work with JPEGs, we were curious to compress a test file sequentially using both encoders. When applied to a set of typical 11-15MP photos, the first stage gave us between 14 and 20 per cent reduction, and the second one resulted in another 20 to 25 per cent reduction. The overall performance of this combination was between 30 and 40 per cent, which is a mind-blowing figure for nearly lossless compression (they were JPEGs, after all).
Go wild with FLIF
We’ve already set our feet on a barely discovered land of alternative image file formats, and it’s now time to change the JPEG format for PNG.
There are many instances when PNG is more preferable than JPEG, such as for screenshots, web images and pretty much everything else apart from photographic images. FLIF stands for Free Lossless Image Format, and it’s based on MANIAC compression. MANIAC (Meta-Adaptive Near-zero Integer Arithmetic Coding) is an algorithm for entropy coding using context-adaptive binary arithmetic.
In our experience, FLIF is a nearly perfect replacement for PNG, lossless WebP, lossless BPG, lossless JPEG2000 and lossless JPEG XR formats in terms of compression ratio. The big advantage of FLIF is that it’s a universal format, which you can use for encoding any kind of image, be it a photograph, a piece of line art, a map or whatever else, and you’ll still be gaining extra kilobytes in each case.
Start by cloning the code from www.github.com/FLIFhub/FLIF and then go to the src directory inside the root tree. FLIF doesn’t have many dependencies other than the libpng development package, but in order to build the software you’ll need to specify Automake targets manually: $ make flif libflif.so libflif_dec.so viewflif
The command above is pretty much self-explanatory: you get an encoder executable, two shared libraries for encoding and decoding, and a simple image viewer for FLIF files. Copy flif and viewfif in /usr/bin (or any other place in $PATH), copy the libraries to something like /usr/lib64 (this may vary across Linux distros) and you’re ready for a ride. The FLIF encoder has many command line options (see the $ flif --help output) but if you don’t use any then it assumes that you want lossless compression with interlacing, such as in the following example: $ flif in.png out.flif
You can only use PNG or PNM files with FLIF, and you should always use the .flif extension for output files for convenience. The result is a file that’s more than 40 per cent smaller than the typical PNG (such as those compressed with the default GIMP settings) and nearly 15 per cent smaller than lossless WebP. FLIF delivers incredible file optimisations and beats all other file formats. The encoder is also reasonably fast, and if you look at the CPU load, it’s somewhere between Guetzli and Lepton.
FLIF has been around for a while, and we already have software that supports FLIF out of the box. The most intriguing is QtFLIFPlugin ( www.github.com/spillerrec/
qt-flif-plugin), which does a splendid job in bringing FLIF to the wider world. The plug-in enables all Qt-based programs to support FLIF natively, as if it was just another basic bitmap format in the line of PNG, TIFF, JPEG and others.
Getting the plug-in to work requires some accuracy in placing its files. After compiling the code you’ll get the libflif.
so shared library, which has exactly the same name as the library that comes with FLIF itself. So, while you already have
/usr/lib64/libflif.so, you should copy the plug-in’s libflif.so to the default destination of your system-wide image plug-ins, like /usr/lib64/qt5/plugins/imageformats/. The *.desktop files go to /usr/share/kservices5/ qimageioplugins, while the x-flif.xml should settle down in /usr/share/mime/packages.
If you don’t use KDE Plasma you can still make your life easier thanks to a standalone app known as Imgviewer ( www.github.com/spillerrec/imgviewer). Associate .flif with Imgviewer in your file manager and you’ll be able to browse FLIF images on any desktop of your choice.
Thumbnailing like a boss
When browsing images with a file manager, you expect to see small previews of them. Generating thumbnails for JPEG or PNG images is straightforward for all major file managers, like Dolphin, Nautilus or Nemo. However, once you start using alternative file formats, things get a bit more interesting.
Previously, we mentioned QtFLIFimage-plugin, which solves the problem of FLIF thumbnails in Dolphin, but you can also obtain FLIF previews in Nautilus or Nemo using a different technique. Create an executable /usr/local/bin/ flif-thumbnailer file with the following: #!/bin/bash temp=$(mktemp).png flif -d “$1” “$temp” convert “$temp” -resize “$3”x"$3” “$2” rm “$temp”
Then create another file, /usr/share/thumbnailers/flif. thumbnailer, and fill it with the following code: [Thumbnailer Entry] TryExec=/usr/local/bin/flif-thumbnailer Exec=/usr/local/bin/flif-thumbnailer %i %o %s MimeType=image/flif; Finally, register a new MIME file type by creating the /usr/ share/mime/packages/flif.xml file. Populate it with the following lines: <?xml version="1.0” encoding="UTF-8"?> <mime-info xmlns='http://www.freedesktop.org/standards/ shared-mime-info'> <mime-type type="image/flif"> <comment>image FLIF</comment> <glob pattern="*.flif"/> </mime-type> </mime-info>
And you’re all set. The above method is slightly ugly as you asking the thumbnailer to convert FLIF to PNG in order to draw each preview, but it works reliably and delivers decent level of performance.
The same can be done for any other file format, if you know the decoding command. In case of Lepton just replace flif -d “$1” “$temp" with lepton “$1” “$temp" and adjust other files accordingly for Lepton, and it should work smoothly. Lepton’s decoding process is a lot faster than the one used in FLIF, so you’ll experience even snappier thumbnail building for .lep files.