WebGL: Build an MP3 visualiser
Dan Neame shows you how to create an audio visualiser that runs in a web browser, using HTML5 Audio and WebGL.
For this tutorial, we’ll show you how you can use HTML5 Audio and WebGL to create an MP3 visualiser that runs in the browser. Note: this tutorial assumes an intermediate understanding of JavaScript. Three.js (MIT licence) is a Javascript library that makes easy work of doing 3D stuff in the browser; it abstracts away all the technical constructs and will enable us to work with things such as more intuitive objects, viz. cameras, lights and materials that allow us to render using either WebGL or an HTML5 canvas.
WebGL is a fairly modern JavaScript API which leverages the power of the graphics processing unit (GPU) to achieve impressive physics and rendering feats. Experience with three.js or WebGL aren’t necessary, but can’t hurt! Virtually any modern browser will support the technologies we’re using here, but we’ve certainly tested it extensively with
Firefox42 and Chromium47. Following this tutorial with the source code to hand is recommended as we won’t be going over every line of code: you’ll find it on the LXFDVD in the
Tutorials/MP3vis directory. Just fire up the index.html file in any modern web browser to see it in action. In order for it to work you’ll need 3D accelerated graphics, this doesn’t mean you have to install proprietary graphics drivers, just that your current driver set up needs to be sane, eg if you were to try and run this in a virtual machine then it isn’t likely to work. You’ll also need MP3 support in order to decode and play the demo track. On Ubuntu systems this involves enabling the universe repository and doing:
sudo apt-get install gstreamer1.0-plugins-bad
Notice we’ve kept the index.html file mostly clean – we only use it to source the required libraries, set up our main objects and run the main loop. We’ve used the minified version of three.js, which deals with all of the WebGL nittygritty. The sandbox.js provides functions for setting up our scene with a camera and a light source (using standard three. js constructions). The Plane.js contains functions for drawing a simple plane, as well as adding vertices and faces to it. We’ll pay most attention to the AudioAnalyser.js file, which deals with loading our audio file and extracting the frequency data.
Loading and analysing the MP3
First, we create an object that will read and analyse our MP3 – we’ve called it AudioAnalyser . The constructor is quite simple; it just stores the path to the MP3 and the width of the buffer we’ll use to sample it – basically, how much detail we want on the waveform. We’ll load the file (via the loadUrl method) using a plain XMLHttpRequest with an ArrayBuffer response type (this will enable us to decode the MP3 with HTML5 Audio later).
Working with HTML5 Audio nodes is like plugging different parts of your stereo together. In order to read the frequency data of the MP3 while it’s playing, we’ll need to connect the MP3 source node to an analyser node, which in turn is connected to the destination node (the speakers).
These audio nodes are all created by an AudioContext object. We’ll need two types of audio node for our visualisation: an Audio Buffer Source Node for the source of the audio (the MP3), and an AnalyserNode to measure the audio as it plays. Each Audio Buffer Source Node is generated by calling create Buffer Source () on the context, and
AnalyserNodes are generated with createAnalyser() . Once our nodes are connected, we’re ready to run an audio stream through them. The AudioContext provides a method called decodeAudioData , which we can use to translate the raw XMLHttpRequest into playable audio. This function runs asynchronously, and takes the response as the first argument and a callback as the second. Once the audio is fully decoded, the callback executes. We set the buffer of our source node to the decoded result, and start the source to play the audio: Audio Analyser. prototype. on Load Audio= function onLoadAudio(xhr){
var context = new AudioContext(); var analyser= context. create Analyser (); analyser.fftSize = this.bufferWidth; analyser. connect( context. destination ); this.analyser = analyser; var source= context. create Buffer Source (); source. connect( analyser ); context. de code Audio D at a(xhr. response, function( buffer ){ source.buffer = buffer; source.start(0);
});
};
If we run this now we’ll hear the MP3 playing, but not much else will happen. To do something with the frequency data we’ll need a way to read it from the audio as it plays. The analyser node provides a method called getFrequencyData , which populates a given buffer with the audio data passing through the node at that time:
Audio Analyser. prototype. get Frequency Data= function getFrequencyData(){ var frequency Buffer= ne wU int8Array(t his. buffer Width ); this. analyser. get Byte Frequency Data( frequency Buffer ); return frequencyBuffer;
};
This will give us a snapshot of the audio that’s currently playing. Specifically frequencyBuffer is an array of unsigned 8 bit integers, ie numbers in the range 0-255, which represents the frequency spectrum of the current sample. Lower frequencies are stored towards the beginning of the array and higher ones towards the end.
Setting the scene for three.js
Notice how we’ve used the value of bufferWidth from our AudioAnalyser object here and again in the onLoadAudio() function. This ensures that our frequencyBuffer is compatible with the FFT data that we’re going to populate it with. If the getFrequencyData method is called in a loop, it enables us to log the frequency data as it changes over time:
var analyser = new AudioAnalyser({ src: ‘mp3/starworshipper.mp3’, bufferWidth: 128 }); function visloop(){ requestAnimationFrame(visloop); console.log(analyser.getFrequencyData()); } requestAnimationFrame(visloop);
So we’re going to refresh the frequency data inside our main loop, where soon we shall also update the graphics. But let’s not get ahead of ourselves, the above snippet will just dump the frequency data to the console, which incidentally is a very good technique for any general purpose JavaScript debugging.
We can use this frequency data to dynamically update our geometry. For this we’ll need a ‘sandbox’ containing everything needed to render our geometry, and the geometry itself (we’ll update this using data from the AudioAnalyser ). Think of the sandbox object as a neat wrapper for all our three.js objects to live in. Once we’ve added the sandbox to the DOM, if we want to animate the scene we will need to move the camera and take a fresh render every time our loop executes.
Adding the dynamic geometry
Now, let’s go ahead and create another object called ‘plane’. This will be responsible for creating and updating the geometry based on the audio data. The constructor takes the width and the length of the plane, and then buildGeometry creates a new geometry with a regular grid of vertices: Plane.prototype.buildGeometry = function buildGeometry(){
var geometry = new THREE.Geometry();
for(var i= 0; i < this.length; i++){
for(var j= 0; j < this.width; j++){
geometry.vertices.push(new THREE.Vector3(i, 0, -j));
}
} this.addFaces(geometry); return geometry;
};
Once the vertices are in place, addFaces connects them together to create a solid object using Face3s. The mesh is composed of the geometry and two materials used to paint it – one double-sided lambert material for the base colour, and a basic material set to render as a wireframe. Lambert materials are good for solid materials that don’t need to be shiny. Basic materials aren’t affected by lighting or shadows, so are good for solid colours. If we add the mesh of our plane to the sandbox with sandbox.add(plane.mesh) , it will appear in the render!
Visualising the waveform
The next step is to plug in the audio data. Here, we’re going to draw the waveform on our geometry. This is done by mapping the frequency data of the audio to the Y position of the vectors on the very first row of our geometry. Remember our initial surface is an X-Z plane, all points therein have a Y-component of zero. We’ll need to iterate over the vertices that compose the first row of the plane’s geometry only. These vertices are stored at indexes of zero to the ‘width’ of the plane. So if the width is 20, the vertices for the first row will be at zero to 19 in the geometry’s vertices array. We set each vertex’s Y position to the value at the corresponding position in the frequency array. we’ve scaled it down here by a factor of 10 for aesthetic reasons – you can change it if you’d prefer a spikier or flatter terrain. Finally, we notify three.js that the vertices on the geometry have changed by setting the verticesNeedUpdate flag. If we don’t set this flag, three.js will use the cached geometry and we won’t see our changes in the next render: Plane.prototype.setFirstRow = function setFirstRow(frequencyBuffer){ for(var i= 0; i < this.width; i++){
this.geometry.vertices[i].y = frequencyBuffer[i] / 10; } this.geometry.verticesNeedUpdate = true;
};
Thanks to setting the castShadow and receiveShadow properties in Plane.js , the peaks will cast a shadow which will darken the valleys in the ‘terrain’ below them. We build faces into our geometry by joining the vertices of the mesh together to form triangles, which are generally simpler to work with than quadrilaterals – these being easily constructed by joining two triangles along a common edge.
There’s some numerological trickery to get your head around in nailing the indices of the required vertices. The first triangle is formed from a given starting vertex (at offset + j , the one immediately to its right ( offset +j + 1 ) and the vertex directly below that one in the next row of the mesh ( offset + w+j+1 ). The next triangle is specified using the same starting vertex and two from the next row, though the order has to be considered here otherwise you risk your face facing the wrong way. Technically this doesn’t matter for double-
sided materials, though: Plane. prototype. add Faces= function add Faces( geometry ){ var offset; var w = this.width; var l = this.length; for(var i= 0; i < l - 1; i++){ for(var j= 0; j < w - 1; j++){ o ff set=i*w; geometry. faces. push( new THREE. Face 3( o ff set+j, o ff set+j+1,o ff set+w+j+1));
geometry. faces. push( new THREE. Face 3( o ff set+w+j +1, o ff set+w+j,o ff set+j ));
}
}
};
The frequency data is passed to the plane via setFirstRow on each tick of the game loop, before the render. If we run it now with plane. set First Row( freq Data ); in our vis loop (), we can see the waveform of the MP3 reflected in the plane. You can see this yourself by commenting out the call to
plane.smear() in index.html – the rest of our plane will remain unperturbed no matter how noisy the track gets. But we want to make the audio affect the whole terrain rather than just the edge? As you have probably guessed, this voodoo is done by the function plane.smear , which resides in the file plane.js. Uncomment the call if you previously commented it. In order to manipulate the entire plane, we’ll need to keep track of the frequency data for the last 100 ticks (the length of our plane) and update the geometry with it. Fortunately, we already have a record of the last tick – it’s already in the vertex positions of the first row. Instead of creating a separate and costly 2-dimensional array to hold the historical audio data, we can simply copy the vertex positions down one on each tick.
We need two loops in order to access every vertex on the geometry. The first loop represents the rows in the plane geometry, which we run backwards up to (but not including) the first row. Then, we set the Y position for every vertex on this row to match the vertex exactly one row behind it: Plane.prototype.smear = function smear(){ var index; for(var i = this.length –1; i > 0; i--){ for(var j= 0; j < this.width; j++){ index = (this.width *i) + j; this.geometry.vertices[index].y = this.geometry. vertices[index – this.width].y;
} } this.geometry.verticesNeedUpdate = true;
};
This ‘smears’ the terrain detail down by one every time it’s run, so if we run it on each tick of the game loop, it completes the terrain effect. So our main visualisation loop, as we noted at the beginning of the article, is quite simple. We introduce the stanza with the modern request Animation Frame () function, which ensures that frames are updated efficiently and without the need to specify a fixed update interval. Strictly speaking that function just ensures that subsequent lines of visloop() are run regularly (unless it’s not worth doing, eg we have it running in a background tab), the actual drawing part is done with the final call to sandbox.render . Then it’s just a case of grabbing our array of frequency data, drawing it on the first row, ‘smearing’ the data so it is all copied one row forward, rotating the camera and finally rendering everything. function visloop(){ request Animation Frame( vis loop ); var freq Data= analyser. get Frequency Data (); plane. set First Row( freq Data ); plane.smear(); sand box. rotate Camera (); sandbox.render();
}
That’s it for our visualiser, but there are plenty of ways you could take it further – how about using two channels for stereo, or maybe you could have the camera fly over the peaks instead of rotating? You could even try mapping the audio to a sphere or tube for some interesting effects!