Findimagedupes
Version: GIT Web: http://bit.ly/2m6cJRw
We’re constantly looking for more applications that could help us optimise a huge collection of images gathered over recent years. We have already tried the promising Lepton and FLIF file formats and enjoyed their splendid compression ratio (see LXF215 and LXF205), but frankly, even perfect data compression doesn’t solve the problem of redundant files and general image overload.
The basic problem is that we have tons of photos, some of which are duplicates, while others are a series of nearly identical shots, e.g. when taking multiple group shots. Thus we need to detect and easily smash unneeded files. Surfing GitHub repositories reveals that there are many such projects based on computer vision technology known as OpenCV, but most of them were designed for Windows and introduce unresolvable troubles when you try to compile the code in Linux. Luckily there is a lighter yet very effective tool called Findimagedupes. This is a small utility that hashes files and can instantly detect either identical or similar images. You control the degree of similarity by passing the desired argument.
The application is a command-line utility written in Go, which is good news, because Go already has nice package management options that can fetch the code and compile it without any hassles. Once you install the package using your standard package manager, set the $GOPATH variable: $ export GOPATH=/path/you/choose
Select the destination, then install Findimagedupes with: $ go get github.com/opennota/ findimagedupes
After a minute or two you’ll find the Findimagedupes executable under the bin subdirectory inside Go’s root. You can simply pass the path to your images as an argument to Findimagedupes to make the application analyse files and show up duplicates (if there are any). Use the -t <int> option to control similarity, where <int> should be in the range of 0 - 64, like this: $ findimagedupes –t 22 ~/Images
Findimagedupes remembers all hashes so it works very fast after chewing your files for the first time.
“Findimagedupes remembers all hashes so it works very fast.”