Networks: Remote compiling

Mats Tage Axelsson likes to share the love and save time when tackling high-load tasks, by passing some of the workload on to other computers…

2018-04-10 - Mats Tage Axelsson

Helping you off-load some of your work to other computers, Mats Tage Axelsson cuts his time when it comes to high-load tasks with handy distributed computing.

Your day-to-day computer activities usually pass without a hitch, until you fill up your memory and start swapping. And if you’re compiling software or rendering graphics when this happens, then you’ll soon know that something’s amiss.

Both activities are memory and CPU intensive, and can take a lot of time to complete. The solution lies in distributed computing, which is the technique of using many computers to do one job.

In this article, we’ll focus on how to compile your software projects on your own computers. We’ll also see what solutions animators are using to speed up their workflow. We’ll also investigate in what circumstances a special set-up for cross-compiling is necessary.

Your first step is to install gcc. The packages you need are defined in most distributions by the buildessential package or similar.

Because gcc supports many languages apart from C or C++, there are many additional packages available if you intend to program in other languages. Java is an interesting example because Android uses it, but Gradle is the most common way to control your Java build. Here we’ll cover the gcc used for C and C++.

We’ll begin by demonstrating how to make gcc compile, and this is achieved by issuing the make command. This becomes a habit when you compile your own code, or someone else’s. The point now is to make sure you can compile in parallel. The make command takes the jobs parameter to set how many jobs can run at the same time: $ cd [your build directory] $ make -j 4

This parameter needs to be set to a value that’s between one-and-a-half and twice the number of cores you have in your system. Before you can successfully distribute your load, it first has to be made parallel. This makes it possible for other systems to spread it across many computers.

The other parameters will differ significantly between builds and so won’t be covered here. When counting your cores you need to take into account how many cores each CPU has in your “cluster”. The jobs parameter is sometimes that can be complemented with the load parameter -l , which specifies the maximum load the compiler should wait for, before starting a new job. This is useful if you want to use the computer while compiling, because it may become sluggish otherwise.

A common problem for home programmers is that they own just one computer. The best way to avoid such sluggish performance is to spread the job over many computers. One way to do this is to use the distcc package, which will take the make command and distribute the load according to the order that you’ve defined in your configuration. The distcc package is available on many distributions by default in a fairly new version. Ubuntu features the 3.1 version, while the newest RC is 3.2, which was released in 2014. That’s obviously a little old, but the package seems stable and usable despite the seemingly freeze in development.

Installing distcc and redirecting

The package with both distcc and its daemon is included in the standard package. If you only want to install the daemon your best option is to compile it for the specific platform. If you only have regular user privileges you can still install and run distcc; however, the dependencies will probably be a pain, so try not to! $ sudo apt install distcc

Now that you have distcc on either both or all nodes, you can start the daemon on the server, which is also called contributor.

The daemon listens to port 3632, so make sure to keep that port open on your contributors. This option is only valid for TCP connections and while it’s fast, it’s not

encrypted and so should only be used inside your own network. As you’re no doubt aware, everything transmitted over the internet must be secured.

When you use ssh the distcc client will call the daemon on its own. Before you start a compile, you need to decide which hosts to use for the jobs you want to execute. This is done very simply in .distcc/hosts – the list will contain the hosts that you want to send your compile to. In ssh mode, there’ll be an @ sign in front of the hostnames. You also have the option of using Avahi, also known as zeroconf, to find the hosts: localhost 192.168.1.6 Here’s the file when you use ssh: @localhost @192.168.1.6

When this is set up you need to go to the directory where your source is and configure your compile as per the instructions of your package. When you come to run

make you need to add distcc or pump depending on what functionality you need. To use any of the remote functions in this article you should add the system you need to the CC environment variable: $ export CC=”distcc gcc” With this variable, the make command will use the

distcc command and distribute the best it can. This works well for other build systems too, as long as they’re compatible with the gcc tool-chain. Setting the CC environment variable is only one way of making

distcc compile your projects. This method has a problem and it’s that many compilers assume that the CC variable has no spaces in it. For this reason, the most common method – the masquerading technique – and the one used by your package manager are to set links in the /usr/lib/bin directory. To make sure masquerading works you must have the directory early in your path, so regular gcc isn’t run undistributed.

Before you start compiling you need to configure your project, and usually you’d use autoconf for this job. The autoconf job isn’t distributed by distcc, because the case for preprocessing with make is different.

Currently, distcc won’t distribute the preprocessing for you by default. For this to happen you need to activate “pump” mode. This is easy to do though – just add pump before the make command: $ sudo pump make -j8

Note that we’re doing this compile as root. This isn’t strictly necessary for most packages; however, some compiles will fail if you don’t. Root is otherwise only necessary when you install to the machine you’re on.

In this author’s case, there are many different platforms that are ideal for compiling. An old netbook, while not powerful, still has its uses. The problem then arises that a standard compiler will install to the platform it’s compiling on. In this case, the netbook is 32-bit and a newer computer is 64-bit.

So, does this mean our old 32-bit machines are useless? Of course not, because in such cases there’s cross-compile or multi-lib support. These libraries enable you to compile on any platform (well, almost), from any platform. The most common approach is to compile on heavy servers for embedded systems and small machines such as the Raspberry Pi.

To install the correct packages search for gcc-multilib or g++-multilib with your package manager and install the required libraries, as follows: $ sudo apt install gcc-multilib

When the multi-lib package is installed, usually one other library is also added – in this case the amd64 one. Check that the platform you want to compile for is installed so they and the header files are available when you try again. If you need libraries for more platforms, search using apt. An example is as follows: $ apt search libgcc- |grep arm

Apt warns you that it doesn’t handle CLI very well, but this only matters when you write maintenance scripts, so you can safely ignore such warnings.

Once all the nodes have all the software you need, you need to establish which one should be used first. This is important because you want to use the fastest machine the most. To accomplish this you put it first in the list of the distcc host’s file. If you have many equal servers, the --randomize flag can be added to make the choice spread in a random manner.

It’s a good idea to exclude localhost if you have many machines. Note also that when distcc fails the compile continues on localhost by default, even if you haven’t defined it in your .distcc/hosts file.

If you want to see what’s happening with your different helpers, you can install and run distccmon. It comes both for Gnome and as a text-based monitor. It’s parameters are -text or -gnome; it defaults to Gnome.

$ distccmon

The interface is a simple list with a graphical view of the load of the hosts. On one line you see the host, the current activity and finally a bar chart of the ongoing jobs on the server.

Get more from Icecc

Icecc has a few features that will help in certain cases. Some provide a better way to schedule and can control the helpers (but requires a central server). There’s also a simpler function for switching compiler version and better support for cross-compiling.

Usually, in a home setting where you have the same computer and the same distribution, you may as well stick with distcc but if you have a mixture of these then

icecream and icecc is worth a look. When you have many machines you need the iceccd -d on most of them and

icecc-scheduler -d on at least one of them. Note that you can have the scheduler on many computers to create redundancy. When one scheduler goes down another will take over automatically.

The scheduler will send jobs according to the speed of the nodes, ensuring that the compile is as efficient as possible. If you need any more incentive to use icecc instead of distcc, look at the icemon graphical monitoring tool. It provides you with six different views and much fancier graphics to show you the state of your compilation tasks.

To take full advantage of your machine and if you have the disk space, add ccache to your set-up. This works best when you need to recompile regularly because ccache then picks up the parts that haven’t been changed. A common example would be when you add a small correction to a bigger project. If you use

ccache then the system will only compile the affected parts. This could be just the code you changed but of course, some of the code using it may also be involved, but in the end the effect is usually very big.

How much disk space you’ll need will depend on the project you’re compiling, but as a comparison example, compiling Firefox needs around 20GB of disk space. It’s up to you if you want to mount a separate partition. The best approach is to use a separate SSD because of the speeds you can get from it. Then decide where to put it in your configuration file. This author mounted to /mnt/

ccache and then used matstage as the user, thus creating his own directory. $ sudo mkdir /mnt/ccache/matstage $ sudo chown matstage:matstage /mnt/ccache/ matstage File ~/.ccache/ccache.conf max_size = 20G cache_dir = /mnt/ccache/matstage prefix_command=icecc

To run icecc with ccache we’ve have added the ‘prefix_command’ statement to the ccache configuration. For testing purposes add the parameters below to your configure script; --with-ccache and -j8 to your make script. So, for Firefox it would be: ac_add_options --with-ccache=/usr/bin/ccache mk_add_options MOZ_MAKE_FLAGS=”-j12”

Each project is a little different, but both options are usually available in some form and you can always add it on the command line.

Get to grips with Gradle

If you’ve followed our previous Android tutorials then you’ll be familiar with using Gradle to compile your projects. This may seem to be only for when you want to compile your Android projects. However, Gradle is actually a build system that many developers use to compile a large range of projects and programming languages.

When you want to compile your project and have been using Gradle, consider using mainframer. This extension makes it possible to run the compile on another server after everything has been set up correctly. The setup procedure starts with making sure you have ssh access to the remote node(s) and that you can reach them easily.

Next, you need to copy the script to the root of your project. Mainframer is a shell script and this makes it quite light to run and modify. The script requires a configuration file in the .mainframer directory under your project root.

The contents of the config file include the remote machine and compression level settings, and only the remote machine entry is required: remote_machine=Samdeb local_compression_level=4 remote_compression_level=4

Other files in the .mainframer directory include the ignore rules, which work in the same way as rsync rules do. The reason is that rsync is the package used to transfer the files. For this example the project is small so you can safely skip the ignore files and move on. When you run it the first time, the script will look for Android studio, including checking whether you’ve accepted the license and ensuring that you have the correct NDK.

Now, on the remote machines, you may not have a window manager which makes installing the entire Studio unnecessary or even impossible. The solution is to use only the command line tools. These can be downloaded as a zip file from the Android website: $ wget http://dl.google.com/android/repository/sdktools-linux-3859397.zip

When the tools are downloaded, you need to unpack into the Android/Sdk directory, where all Android development software lives by default.

The proof that you’ve accepted the license is in the license directory under the SDK library. If you install

only the command line tools on the remote machine, then the only way you can accept the license is to copy the licenced directory over from one machine to another. The simplest way to do this is to use sftp: $ cd Android/ $ sftp samdeb $ cd Android/ $ put licenses/*

Without the licence files, gradle won’t be able to download platform packages to match your project. The first time you run it, the appropriate platform files will be downloaded to the remote machine. The system will also look for an NDK – you can ignore that warning unless you have C code in your project.

Without the licence files, gradle won’t be able to download platform packages to match your project.

When you want to run the tool using the local version, use the following command. $ ./mainframer ./gradlew build

The bulk of the ‘mainframer’ package needs to be added to your projects configuration files, in the root. To activate it you start by going into the Run>Configure menu and find the Before build section. In this section you need to remove the ‘Gradle aware‘ entry, mainframer will handle the connection to gradle. In the square you set ‘bash’ and ‘mainframer gradlew...’ to make mainframer control the execution of gradle. After this is done, mainframer will run the jobs on your designated node for processing. There’s currently no cluster functionality in this package, just a way to move the jobs out to a hopefully more powerful server.

Other high- power tasks

Fortunately, compiling software is easy to split into many jobs and distribute. When we say easy, we mean that the process consists of many independent jobs that lend themselves to distribution.

Another task that can be distributed is rendering your graphics. The Cycles rendering engine in Blender comes with a schedule function, we look at how to get this running and at the same time making it efficient. The best way to reduce the time to finish a job is to distribute the tasks using a scheduler.

Flamenco is the name of the scheduler created for Cycles, and as you may have guessed it originates from the Blender team in Amsterdam. The reason it’s included here is for comparison and also because the team claims that you can use the package to distribute other work across many computers. This requires some development effort, but it’s still an interesting prospect.

ML online engines

If you have a model that you need to train for your machine-learning project, you may have to wait for a long time to get enough runs through the data you’ve collected. The best approach to take when your sets are large is to use cloud-based solutions. Some of these include GCP, Amazon and Alibaba. Google Cloud Platform is ready to store your files, search your data and yes, train your models. Hopefully, we’ll soon use quantum computers to help with this – they’ll be absolutely excellent at doing this job.

In this article, we’ve seen that you can use your old machine to do tasks that used to require a long time on a single machine, and nothing can change that. We’ve explored many time-saving alternatives, so next time you’re stuck waiting, weigh your alternatives and reduce your opportunities for distraction!

?? ?? Here’s distccmon showing the level of activity in the other CPU, and in this case using the localhost for compiling the test package. — Here’s distccmon showing the level of activity in the other CPU, and in this case using the localhost for compiling the test package.

?? ?? When Mats started with Linux he had an IBM Thinkpad and floppy disks. He still tries to squeeze the maximum out of ageing hardware, the cheapskate! — When Mats started with Linux he had an IBM Thinkpad and floppy disks. He still tries to squeeze the maximum out of ageing hardware, the cheapskate!

?? ?? Here, the icecc monitor has an improved interface and can show you many different aspects of the compile process. — Here, the icecc monitor has an improved interface and can show you many different aspects of the compile process.

?? ?? In Android studio, be sure to set the parameters correctly in your run/debug configuration menu, because it’s easy to make a simple mistake with the path to the binaries. — In Android studio, be sure to set the parameters correctly in your run/debug configuration menu, because it’s easy to make a simple mistake with the path to the binaries.

?? ?? The Flamenco render manager is designed to work with the Blender cloud but can be configured to run on your own computers as well. You’re on your own if you want to try, though! — The Flamenco render manager is designed to work with the Blender cloud but can be configured to run on your own computers as well. You’re on your own if you want to try, though!

?? ?? The first time the mainframer runs, it will download all the necessary libraries to the remote node and leave them in a cache directory. You’ll need to make sure there’s enough disk space on your slaves. — The first time the mainframer runs, it will download all the necessary libraries to the remote node and leave them in a cache directory. You’ll need to make sure there’s enough disk space on your slaves.

Networks: Remote compiling

Mats Tage Axelsson likes to share the love and save time when tackling high-load tasks, by passing some of the workload on to other computers…

Newspapers in English

Newspapers from Australia