Next-gen Linux audio...........
The only good system is a sound system. Jonni Bidwell is here to tell you all about Linux’s offerings, so listen up!
First there was the Advanced Linux Sound Architecture, then Pulse Audio and now Pipe Wire rules over them all, says Jonni Bidwell.
Sound is a sensitive issue, and as humans we’re very sensitive to audio stimuli. The reason sprinting races are started with a gun and not a flash is because we react much more quickly to sound (about 150ms) than light (about 200ms). If we’re watching a film while our system is busy, then the video and audio may become momentarily desynchronised.
In order to restore sync we could skip (or back up) either audio samples or video frames. Almost universally, media players opt for the latter, since viewers will notice a blip in the audio much more than a couple of dropped frames. We tend to take for granted being able to play high-quality audio without discerning any distortion, but keeping all those buffers healthy and keeping everything ticking in time with the quartz crystals in the audio hardware is hard work.
Linux often gets a bad rap for multimedia support. Whether it’s mp3 playback not working out of the box, video tearing, or Blurays requiring voodoo number theory and a blessing from the god Ba’al before they play (see LXF223), there are no shortage of gripes. Most of the time, though, this isn’t Linux’s fault, or even the fault of the hard-working maintainers of kernel driver stacks or multimedia projects. Very often there are murky patents governing the use of particular technologies. Then there’s hardware that doesn’t adhere to standards – and let’s not forget that dragon, DRM. In fact, Linux has an impressive, state-of-the-art multimedia stack, capable of handling not just a 7.1 soundtrack while leisurely streaming 4K video, but also, thanks to JACK, 192khz studio recording or music production.
Further, the nascent Pipewire project will modernise things yet more, bringing lowlatency playback/recording, real-time multimedia processing and support for sandboxed applications. But even today, Linux distributions have some state-of-the-art multimedia capabilities. Join us on a journey through the multimedia systems that, for the most part, we no longer need to fight with…
“Linux has a state-of-the-art multimedia stack, capable of handling a 7.1 soundtrack”
The first audio subsystem for Linux (and other UNIX-like animals), the Open Sound System (OSS), provided basic support for playback and recording, and more than satisfied the audio needs of most ‘90s bods (we were simpler creatures back then). There was also patchwork support for some devices that was provided directly by the manufacturers (some of them did care about Linux, even in the 90s), but this was generally closed source.
OSS grew out of the drivers for the then-popular Sound Blaster 16 card, which had many clones. It also provided the lowlevel kernel drivers for audio hardware, as well as an API for applications. As with anything vaguely hardware-related in the early days, getting sound working required recompiling your kernel, and optionally tears or hair loss. Functionally, OSS provided the /dev/
dsp* and /dev/mixer* devices, which generally could only be accessed by one process at a time. This meant that two applications couldn’t play sound simultaneously, unless the hardware was capable of mixing the streams natively, and OSS was able to persuade it to do so. To solve this the KDE and Gnome desktops developed their own sound systems, aRTs and ESD respectively, which did the required mixing in software and despatched the resultant stream to OSS. This worked well, and made writing audio applications much easier, unless of course you still needed direct OSS support, or wanted to support both aRTs and ESD.
And so began the Jenga-like adding of layers to the audio stack. Simple DirectMedia Layer (SDL) is a wrapper around all of the above (as well as input drivers, DirectX/OpenGL and the Windows and Mac sound systems) that’s still around today. Its portability makes it especially popular for cross-platform games. But one wrapper is never enough, and so libao was born. Libao had some nice features and eventually found its way into the popular Mplayer project in 2001 in the form of libao2 – the
–ao option lives on there as a means to choose which sound system to use.
ALSA in action
In 2002, OSS developer Hannu Savolainen, then contracted by 4Front Technologies to work on the stack, released OSSv4 under a proprietary license (though it was re-released under the GPL five years later and is still developed today). This led to Linux adopting the Advanced Linux Sound Architecture (ALSA, which had been in development since 1998–see LXF108) for the 2.6 Kernel. Many people were happy with this arrangement, although there were criticisms of OSS on Linux besides its
“Open Sound System grew out of the drivers for the Sound Blaster 16 card”
newfound proprietary licensing, including its shifting a bunch of signal processing code into the kernel and other gripes lost in the sands of time.
ALSA is a complicated beast, and from the outset aimed to be much more than OSS. Most notably, ALSA wanted to treat hardware uniformly with thread-safe kernel drivers, provide software mixing where necessary (to accommodate onboard audio codecs, such as the ubiquitous AC97, which offloaded mixing duties to the CPU) and improve MIDI support. But it also wanted to be compatible with OSSv3, so an emulation layer was required.
So ALSA consisted of a kernel component, which provided the hardware drivers, together with a userland library exposing the native and OSS userspace APIs as well as the mixing component. ALSA’s own low-level API is pretty beastly, which had consequences we’ll discuss later. There were also plugins for up/downmixing, equalisation, resampling and interacting with all the other audio systems.
Setting up software mixing was a bit of a mission in the early days, and whether it worked depended on your hardware and the phase of the moon. You’d have to create a configuration file ~/.asoundrc, call the dmix plugin to arms, and then spend a day listening to two applications fight it out. Often you’d give up and decide you didn’t really need to hear the ding of an AOL instant message if it was going to interrupt your 128k Metallica MP3.
This functionality was actually disabled by default initially, so casual desktop users still relied on ESD and aRTs. Those and other systems then had to add support for ALSA, or use its OSS emulation layer. So much for progress. However, things improved, bugs were found and fixed, and mixing support was enabled by default in major distributions. Professional musicians were able to use the JACKAudio
ConnectionKit ( JACK, started in 2001) and real-time kernel patches to seamlessly route audio between applications. All of a sudden Linux became a serious platform for music production.
Not a perfect solution
But on the desktop, some gripes remained. ALSA’s dmix implementation was a little hacky. It didn’t really enable multiple streams to access the hardware at the same time; rather, it allowed whoever got there first to share their access. In most cases this amounted to almost the same thing, but things broke down when, for example, multiple users tried to play things simultaneously.
There was also only a single software volume control, so lacked per-application volume control. In some cases, this could be worked around, but for others, especially playing network audio, the shortcomings became apparent. Windows Vista, for all its resource-sapping widgets and abundant other flaws, did feature a whole new audio stack. Apple, likewise, had its CoreAudio stack, which like so many Apple things was simply magical.
Enter PulseAudio. It is perhaps a measure of how popular Linux (or perhaps just Ubuntu Linux) had become that PulseAudio drew so much criticism when it went mainstream. It started life as Polypaudio in 2004 and four years later found its way into Ubuntu 8.04. Intentions were honourable, but unfortunately things did not go so smoothly – see the original ambitiously titled mission statement at http://bit.ly/audio-mess.
PulseAudio is a sound server that sits on top of ALSA. It doesn’t touch the kernel at
all, and aimed to replace middle layers such as aRTs and ESD while at the same time maintaining compatibility for them. It provided exciting new features such as network audio, timer-based (as opposed to interrupt) scheduling, on-the-fly switching of inputs and outputs, as well as per-stream volume controls. This brought parity with the recently released Windows 7.
The problem was, most Ubuntu users didn’t need (or didn’t think they needed) these features, and the previous ALSA/ ESD arrangement worked perfectly well for them. Suddenly PulseAudio was on their systems, and audio was stuttering, distorting, or going out of sync during video playback, while certain applications no longer worked at all. Forums became awash with disgruntled users wanting to banish this audio heathen. Also disgruntled users who had broken their systems by uninstalling PulseAudio without due care and attention, taking with it all the applications that depended on it, of which there were many. Complaints about
PulseAudio not working began to outnumber those concerning wireless.
There were certainly some bugs in the version of PulseAudio shipped with 8.04, but it’s not fair to plant all the blame there. There were more bugs in the Ubuntu implementation. But more interestingly, the mass adoption uncovered bugs in that sprawling ALSA API we mentioned earlier. So complex was this beast that much of its functionality remained unused (and largely undocumented) until PulseAudio, and suddenly here was all manner of hardware and applications hitting these hitherto unused features. These have all been fixed and anyone that’s ever used HDMI audio should be grateful: without PulseAudio’s advanced routing functionality this would be much harder. The latest PulseAudio release, 11, adds support for various Airplay and Bluetooth devices, and even includes support for GNU/Hurd.
It’s still possible to run a pure ALSA configuration. This provides lower latencies and would be suitable on a constrained device where only a single audio application is installed, for example, an old Raspberry Pi running MPD (although if you care about sound quality you really should add a DAC to it – sound quality through the Pi’s headphone jack is weak). PulseAudio is included in all major desktop distributions and by now is definitely at the ‘just works’ stage for most desktop hardware. The only reason to avoid it is for professional audio work, where sub 10ms latencies are required. There’s really only one solution here (okay, technically two) so check the box ( belowleft) if you don’t know JACK.
Totally wired
The latest development in the Linux multimedia sphere is the Gstreamer man Wim Taymans’ PipeWire. This ambitious project was originally titled PulseVideo, and yes, it aims to do for video what PulseAudio does for audio. We don’t mean that PipeWire makes you want to uninstall it by any means necessary and revert your video playing subsystems back to how they were, all the while cursing your distro for adopting this “rubbish”. Nay, PipeWire hopes to reduce fragmentation and simplify existing frameworks for media playback. Codecs can be managed from a single place so there will be no more situations where a file will play in one application but not in another.
PipeWire also wants to ease the transition toward Wayland and containerised applications (those packaged as Flatpaks, Snaps, Appimages). Anyone who’s dabbled with Wayland will be aware that it currently struggles with things like screenshots and screen recording (a shortcoming that also affects remote desktop software).
The reason why these don’t work is because Wayland isolates applications in the same way as Flatpaks and the rest do, preventing them from seeing what other applications are up to (including what the compositor is putting on the screen). By adding support for
PipeWire in the compositor this could be solved in a secure manner. Likewise, adding support in the SPICE protocol will benefit multimedia applications running in VMs.
You can read more at Christian Schaller’s blogpost at http://bit.ly/
launching-pipewire. You can experiment with PipeWire in the recently released Fedora 27. Incidentally, this release allows hithero verboten audio formats such as MP3, AAC and AC3 to be played without the use of third-party repositories. In a world of streaming media and awesome open formats like FLAC and Ogg Vorbis, this is perhaps less relevant than it once was, but better late than never. LXF
“PulseAudio is definitely at the ‘just works’ stage for most desktop hardware”