Visualise your data plots
Mihalis Tsoukalos explains how to create impressive visualisations with the Javascript library D3.js and a big pile of data.
Mihalis Tsoukalos rustles up impressive visualisations with the Javascript library D3.js and a big pile of data.
This month’s coding tutorial is on D3.js, a powerful low-level Javascript library that can create unique, highly customisable and impressive graphical output based on your data. For reasons of simplicity most of the examples shown here will include the data that will be visualised in the HTML file that contains the Javascript code. However, D3.js enables you to read data from text files that reside either on your local machine or online.
The D3.js (Data Driven Documents) library can be downloaded using a tag in the HTML page that contains the Javascript code. The code processes and creates your visualisations, so there’s no need to download it locally. Additionally, you’re free to read the Javascript code of D3.js and make any changes or improvements you want to it!
There are several advantages to using D3.js. First, it can create professional output on the fly without the need to store PNG and PDF files that you’ll have to embed into HTML code. Second, if you’re reading data from the Internet or from a local file that changes, a page refresh is enough to obtain the new data and automatically update the plots. Third, D3 can create animated and interactive graphics – and there are plenty of visualisations that can benefit from these two features. Finally, if you already know Javascript, using D3 is straightforward as long as you’re willing to learn its rich API.
When creating plots with D3.js, bear in mind that each D3 graph requires an HTML file that contains the DOM tree and the Javascript code for manipulating the DOM tree and the data. If you’re familiar with Javascript you might already know that you can organise your Javascript code using multiple files that you can call using the tag. However, this tutorial will put all Javascript code in a single HTML file.
Drawing the canvas
The canvas is the area that you give D3 for drawing. It can be shorter than the browser windows, which is the norm. Note that the coordinates of the upper left corner of the canvas are always (0,0) whereas the coordinates of the bottom left corner are (0, height), whatever the value of the height variable is. The coordinates of the upper right corner are (width, 0), based on the value of the width variable, and the coordinates of the bottom right corner of the canvas are (width, height).
The essential Javascript job that does the job is svg.append(“rect”)
.attr(“width”, width)
.attr(“height”, height)
.attr(“fill”, “orange”);
SVG stands for scalable vector graphics and is an Xml-based vector image format. You can learn more about SVG at https://en.wikipedia.org/wiki/scalable_ Vector_graphics. Moreover, the size of the canvas is defined by two Javascript variables: width and height. The names of the attributes for creating a rectangle, which also enable you to create a square, are also “width” and “height”. That rectangle is the canvas, which is going to be painted in orange. Explore the source code of canvas.html for more details.
The output of canvas.html can be seen in the screenshot (above). This type of output can be handy when you’re drawing something using D3.js but you can’t see it on your screen. This is because it enables
you to see the size of your canvas as well as whether the Javascript statements you’re executing are using the correct variable names or not.
Now that we know that everything works, we can plot some basic objects.
Shapes and text
With the canvas in place, let’s see how to draw basic shapes and text on the screen using D3. The Javascript code that draws a circle and some text is the following: svg.append(“circle”)
.attr(“cx”, width/2-40)
.attr(“cy”, height/2-40)
.attr(“r”, 40)
.style(“fill”, “blue”);
const message = “Hello Linux Format!”; svg.append(“text”).text(message)
.attr(‘x’, width/2)
.attr(‘y’, height/2)
.style(‘fill’, ‘black’);
The attributes for creating a circle are cx, cy and r, which are the coordinates of the centre of the circle (in relation to the canvas) and the length of its radius, respectively. The x and y attributes that appear in most
append() functions enable us to move the active canvas position to the desired coordinates, which allows you to define the point your text will be positioned. Note that in
attr() , the first parameter is the name of the attribute, which is predefined, and the second parameter is its value. The output of text.html that contains multiple shapes and text can be seen here (above right).
Drawing data sets
Let’s turn to drawing a single set of integer values. The values will be generated randomly using Javascript code. This means that each time you load singleset. html you’ll get a different output. The HTML code of
singleset.html can be seen on page 80 (bottom left) and merits some explaining.
First, you should know that D3.js has multiple versions. The version of D3 used in singleset.html is v5 as defined by https://d3js.org/d3.v5.min.js in the script tag. Second, the data is stored in the dataset Javascript variable and is generated randomly – the number of data points in dataset is held in the totalpoints variable. The values of the elements in the dataset variable are from 0 up to max, where max is another user-defined variable. These values are displayed in the y-axis. The x-axis uses all values from 0 to totalpoints, which is a variable that holds the number of points in dataset. Because both the x- and y-axis use numeric values, the d3.scalelinear() function is used for both xscale and
yscale. The x-axis is created using a call to d3.axisbottom(xscale) whereas the y-axis is created using a call to d3.axisleft(yscale) .
Generally speaking, to create an axis you usually need a straight line, some tick marks and labels for the tick marks and the axis itself. Apart from the straight line, all other elements are optional but useful.
The d3.axisbottom() function places an axis at the bottom of the graph, whereas the d3.axisleft() function puts an axis on the left side of the graph. You can also use d3.axisright() and d3.axistop() for placing your axis on the right side or on top of the graph, respectively.
The yscale = d3.scalelinear().domain([0, max]). range([height, 0]); statement is used for telling D3.js that the values for the y-axis are going to be represented using values between the value of the height variable and 0. The range of the real values is returned by the call to domain() .
The xscale = d3.scalelinear().domain([0, totalpoints-1]).range([0, width]); statement tells D3.js that the real values in the x-axis will be from 0 to totalpoints-1, this is the job of xscale.domain() , and that these values are going to be represented in a range from 0 to width because these are the valid values in the canvas. Both xscale and yscale deal with the real values and how they’ll be represented on the screen. Remember that although the real values might not change, the dimensions of the canvas and the area that we’re allowed to draw them might alter.
Technically speaking, these scale objects map an input domain to an output range. The input domain comes from the data whereas the output range comes from the computer and its screen. Scale objects are functions that accept a value from the input domain and return a value that belongs to the output range. Apart from the linear scale, where both input and output are linear spaces, D3.js offers d3.scaleidentity() , d3.scaletime() , d3.scalelog() , d3.scalesqrt() , d3.scalepow() , d3.scalesequential() , d3.scalequantize() , d3.scalequantile() , d3.scalethreshold() , d3.scaleband() , d3.scalepoint() and d3.scaleordinal() . The input of d3.scaletime() is a date and its output is a number – this is used for plotting dates and times. The d3.scaleordinal() is used when the input is a discrete domain, such as a set of categories, where you want to map that discrete domain with predefined values. For example, d3.scaleordinal() might help you with representing each category of a set with a different screen colour. The other scale objects do an analogous job as described by their name, but are used less frequently.
The d3.line() call specifies that we’re going to draw a line. The data set is added to the output using a svg. append(“path”) block with a class of line at the end of the Javascript code. Remember that the CSS code plays a key role in this case because it specifies the characteristics of the line that’s displayed. Try making changes to the CSS code to see the difference.
Line charts are handy when you have to plot data sets with lots of elements. On the other hand, if there’s less data to plot then even if you have multiple data sets, using bar charts might be a better choice.
Double the data sets
Take two data sets into the same plot? You’ll need to find a way to differentiate between the points of the two data sets, which isn’t that difficult given the capabilities of D3.js. The simplest way to do so is by using different colours for the elements of each data set. This time we’re going to plot the data using a scatter plot, which can be useful when you want to draw data sets with a range of elements. The most interesting Javascript code in twosets.html is the following: svg.append(“g”)
.selectall(‘dot’)
.data(dset1)
.enter(). append(‘circle’)
.attr(‘cy’, function(d) { return yscale(d.y); } ) .attr(‘cx’, function(d, i) { return xscale(i); } ) .attr(‘r’, function(d) { return d.y; })
.style(“fill”, “#69b3a2”);
The data() function is used for associating the current object with a given data set. It’s the data from that data set (dset1) which will be used in the current block of code. Perhaps unsurprisingly, you’ll need a similar Javascript code block for drawing the data from the second data set (dset2).
What makes this plot unique is the use of circles to define the points of the scatter plot. Additionally, the radius of each circle varies because it’s based on the current value that is being displayed. Although the current data set contains single values, this technique is very handy when having to display multiple values on two dimensions. Last, note that for getting the “cx” value, which is the current index of the array, you need to define a function with two parameters, which is not the case when getting the value of “cy”.
You can look at the source code of twosets.html for more details about the implementation. The output of twosets.html can be seen in the graph (above).
To the Internet
Reading a plain text data file from a Github repository sounds useful, so let’s do that. When working with JSON files, you’ll need to use the d3.json() function to read an external file. On the other hand, if your data is in CSV format, you can use the d3.csv() function for reading and parsing it, which is the case with readfile.html.
The name of the data file is lxf.csv and is located at the https://github.com/mactsouk/datasets repository. However, accessing it as a plain text file requires the use of the https://raw.githubusercontent.com/mactsouk/ datasets/master/lxf.csv URL. If you try accessing it as
https://github.com/mactsouk/datasets/blob/master/ lxf.csv, you’ll get the entire Github web page instead of a single file, which is not what you really want!
The generated plot will present a bar chart of the data. The key Javascript code is the following: g.selectall(”.bar”)
.data(dset) .enter().append(“rect”)
.attr(“class”, “bar”)
.attr(“x”, function(d) { return xscale(d.name); }) .attr(“y”, function(d) { return yscale(d.age); }) .attr(“width”, xscale.bandwidth()) .attr(“height”, function(d) { return height - yscale(d. age); })
.attr(“stroke”, “black”);
Creating a bar chart is relatively simple: all you have to do is to draw multiple rectangles, one rectangle for each element of the data set. Apart from the starting point of the rectangle, which is defined using the “x” and “y” attributes, you need to define the width and the height of the rectangle. The colour of each bar is defined at the beginning of readfile.html using some CSS. Finally, define the stroke colour for each bar using .attr(“stroke”, “black”) .
You can find more details of the implementation by looking at the source code of readfile.html. The output of readfile.html can be seen in the chart (above).
Plotting on a world map
You’ll often find charts detailing data spread over maps, so we’re going to look at plotting coordinates on a world map! Once again for reasons of simplicity, the data points will be hard coded in the HTML file. However, the required data for drawing the world map will be read from the internet. The data points with the latitude and longitude values are defined as follows: data = [{“value”: {“Latitude”: 58.760883333,“Longitude”: 9.898933333}},
{“value”: {“Latitude”: 47.61, “Longitude”: -122.33}}]; Note: you can add as many data points as you want.
More details of the implementation are in the source code of map.html. When drawing your map, you’ll have to use an external file that contains that data for the map. map.html contains Javascript code that enables you to zoom in and zoom out, which is handy when you want to seeing details. The Javascript code that adds the zoom in and zoom out functionality is as follows: var zoom = d3.behavior.zoom() .on(“zoom”,function() { g.attr(“transform”,“translate(“+ d3.event.translate.join(”,“)+”)scale(“+d3.event. scale+”)”); g.selectall(“path”)
.attr(“d”, path.projection(projection)); }); svg.call(zoom)
The output of map.html can be seen below.
D3.js is powerful, but you’ll have to spend some time to learn its API and create your own library of visualisations that you can modify and adapt.