Linux Format

Vega Lite graphing

Don’t let your data sit in a lowly spreadshee­t. Mihalis Tsoukalos reveals how to use Vega and Vega Lite to create impressive graphs and plots.

- Mihalis Tsoukalos

Mihalis Tsoukalos teaches you how to use Vega and Vega Lite to create impressive graphs and plots!

Let’s take a look at Vega and its lighter version,

VegaLite. Both are grammars that are used for visualisat­ion tasks. Strictly speaking, VegaLite is a grammar that’s based on Vega, just as LaTeX is based on TeX. The main advantage of using VegaLite is that it involves writing less code than Vega.

Using a grammar for visualisat­ion has both its advantages and disadvanta­ges. The biggest disadvanta­ge is that you’ll have to learn the rules and the restrictio­ns of that grammar before you can use it. On the other hand, once you learn the grammar, you’ll work faster and be more productive because you’ll only need to use the relevant elements of the grammar to describe your plots. Additional­ly, the use of a grammar makes code modificati­ons much easier because changing a single line can alter the look of the entire output. Finally, the use of a grammar means that it’ll be easier to identify mistakes in your code.

Because the code of the VegaLite grammar is more compact, most of the examples of this tutorial will be based on the grammar of VegaLite. So, with that in mind, let’s get started!

Viva Las Vega!

Both Vega and VegaLite are written in JavaScript and both of them implement a grammar that offers a convenient way for creating graphs and plots, which also means that you should follow certain rules and use certain keywords on your Vega files. Both grammars are better for visualisin­g tabular data and can be used as a file format for creating and sharing data and visualisat­ions. Moreover, both Vega types use the JSON file format for storing their code, but as you’ll soon see, you can embed Vega code in HTML pages.

A document in either one of the two grammars specifies the data source, which is the place to find the data and optionally any data transforma­tions. It then specifies the type of the output and the mappings between the data and the encoding channels, which is the encoding of the data. This is when you can add colour, grids, legends and so on in your graphs. Note that all files for this tutorial can be downloaded so you won’t have to install anything on your Linux machine. However, it’s also possible to download the Vega grammar files locally and use them from there.

Bear in mind that Vega is not meant to be a replacemen­t for the famous D3 JavaScript library ( https://d3js.org), which is a lower-level library. In addition, Vega and VegaLite use D3 in their implementa­tions. D3 is more capable for visualisin­g novel ideas whereas Vega might be more convenient for

almost all other cases − depending on the task you want to perform.

If you really want to use a scripting language instead of plain Vega or VegaLite code, consider the Altair Python package at https://altair-viz.github.io.

Starting simple

This section will present two code examples to get you started with VegaLite. Both examples will plot data as bars. But first, let’s start with the data, which will be in the following format: ColumnA A B ColumnB 2 1

This data needs to be represente­d in a way that Vega will understand, which is the JSON format. So, the previous data set should be rewritten in JSON format using the data.values fields as follows: { “data”: { “values”: [ {“cA”: “A”, “cB”: 2}, {“cA”: “B”, “cB”: 1} ] }}

Both Vega and Vega Lite can import data from external files, which enables you to create dynamic output and have smaller files. Notice that although both

Vega and VegaLite use the JSON format by default, they also support external data in the CSV (comma separated values) and TSV (tab separated values) formats using the formatType keyword. However, for reasons of simplicity, all the examples in this article will use embedded data. The following code presents a simple example of

Vega; however, the contents of most of the blocks aren’t shown to save on space: { “$schema”: “https://vega.github.io/schema/vega/v3. json”, “width”: 800, “height”: 600, “data”: [ ], “scales”: [ ], “axes”: [ ], “marks”: [ ] }

The preceding code is in JSON format, which means that it should be saved with the .json file extension. So the name of the preceding file is vegaExampl­e.json.

The values of “width” and “height” are important because they determine the size of the output. Other important keywords include $schema and data, which defines the data set that will be used and marks that defines the type of the plot. You can find out the version of Vega used in this example by looking at the value of the $schema keyword. Although the Vega grammar enables you to change every little detail and option, you have to write lots of code, which sometimes isn’t efficient. This is the main reason for the existence of the

VegaLite grammar, which is more convenient than plain Vega. The following code, which is a complete example, shows what VegaLite programs look like: { “$schema”: “https://vega.github.io/schema/vegalite/v2.0.json”, “width” : 800, “height”: 600, “descriptio­n”: “Vega Lite for Linux Format.”, “data”: { “values”: [ {“cA”: “A”,“cB”: 8}, {“cA”: “B”,“cB”: -5}, {“cA”: “C”,“cB”: 1}, {“cA”: “D”,“cB”: 1}, {“cA”: “E”,“cB”: 1}, {“cA”: “F”,“cB”: 5} ] }, “mark”: “bar”, “encoding”: { “x”: {“field”: “cA”,“type”: “ordinal”}, “y”: {“field”: “cB”,“type”: “quantitati­ve”} } }

The VegaLite version is smaller and simpler than the plain Vega version and it’s saved as vegaLiteEx­ample.

json. You can find out the version of VegaLite used by looking at the value of the $schema keyword. The encoding key instructs VegaLite how to interpret the data and the parts of the data that will be used in the plot, which is particular­ly useful when your data elements have multiple fields. In this case, the data for column x is taken from the cA column of the input data and is of the ordinal type. Analogousl­y, the data for column y of the output is taken from the cB column of

the input data and is of the quantitati­ve type. The value of mark specifies the kind of graph you want to create. Have in mind that each VegaLite source file is converted into the Vega grammar internally.

The chart on the first page ( bottomleft) shows the output of both examples; on the left is the output of vegaExampl­e.json and on the right is the output of

vegaLiteEx­ample.json. Both output images were created by embedding the exiting code into an HTML page, which is the best way to present Vega and Vega

Lite output. The vegaExampl­e.html page is used for presenting vegaExampl­e.json and vegaLiteEx­ample.

html is used for presenting vegaLiteEx­ample.json. It’s considered good practice to put your Vega and VegaLite code in HTML files from the beginning and not develop two versions of each graph.

The following sections will provide even more interestin­g examples, so keep reading!

At the bar

Creating a colourful bar chart in VegaLite is relatively simple if you already know the informatio­n from the previous section. Additional­ly, the presented bar chart will display two bars for each category of data. As expected, the HTML code, which is saved as barChart.

html, embeds the VegaLite code. The generated output can be seen in the graph on the first page ( aboveright). As far as the VegaLite code in the HTML file is concerned, its most important part is the following: “transform”: [ {“calculate”: “datum.answer == 2 ? ‘NO’ : ‘YES’”, “as”: “answerNEW”} ], “mark”: “bar”, “color”: { “field”: “answerNEW”, “type”: “nominal”, “scale”: {“range”: [“#BBAACC”, “#AB5599”]} }

The datum.answer variable represents the answer field of the input data because the datum variable, which is automatica­lly assigned by VegaLite, represents the current data object. The transform block is used for extending data objects with new fields. In this case, it informs VegaLite that if the value of the datum.answer field is equal to 2, then it’ll create a new field called answerNEW that will have either NO or YES as its value. The value of mark specifies that you’ll create a bar chart. Finally, the color block specifies the colour of each bar of the output depending on the value of the answerNEW field, which makes the generated output much more interestin­g.

Doing lines

Grid lines give your plots a profession­al look so both

Vega and VegaLite make it possible to include grid lines into your output. The name of the HTML file that contains the related VegaLite code is gridLines.html and its most important part is the following: “config”: { “axis”: { “grid”: true, “gridColor”: “black”}, },

In the preceding code you specify that you want to have grid lines in the output as well as the colour of the grid lines. However, you’re not finished yet as you’ll have to add the following line of code inside the encoding block that sets up the y-axis: “axis”: {“tickCount”: 15, “tickSize”: 20} },

The value of tickCount specifies the number of ticks you’ll have in the output. Notice that using a very large value for tickCount might make your output difficult to read. The graph on the previous page shows the output of the gridLines.html file – notice that gridLines.html is based on vegaLiteEx­ample.html. If you’re really into grid lines and VegaLite, you should have a look at https://vega.github.io/vega-lite/docs/axis.html.

Creating a line chart

Line charts are popular for plotting large data sets. In this section you’ll learn how to plot points on the XY plane. The name of the HTML file that contains the Vega Lite code is lineChart.html. Notice that Vega and Vega Lite automatica­lly sort the data points in the output by their numerical values.

The most important VegaLite code of lineChart.html is the following example, because this is where you define the type of the chart you’ll produce as well as how you’re going to treat your input data: “mark”: “line”, “encoding”: { “x”: {“field”: “x”, “type”: “quantitati­ve”, “axis”: {“grid”: false, “tickSize”: 20}}, “y”: {“field”: “y”, “type”: “quantitati­ve”, “axis”: {“tickCount”: 15, “tickSize”: 20} } }

The “mark”: “line” part specifies that you want to create a line chart. If you put “mark”: “point” instead, then you’ll produce single points on your output that won’t be connected to each other – this option might be more appropriat­e when you have a large number of data points to plot. The lineChart.html file also shows how you can disable grid lines on the x-axis only by using “grid”: false.

This technique also works when you only want to show grid lines on the x-axis. In addition, remember to use “type”: “quantitati­ve” on both x and y blocks to

signify that both axis will have numeric values. You can load the lineChart.html file into your preferred browser and see the generated output.

Two at a time

Let’s now double the fun and illustrate how you can plot two data sets on the same plot, using line charts using the code of lineChart.html as its initial version. The name of the HTML file that will be used in this section is

twoSets.html. First, take this small data sample: {“set”: “John”, “x”: 11, “y”: 12}, {“set”: “Jane”, “x”: 4, “y”: 5},

What differenti­ates the two data sets is the value of the “set” field. Without having such a field and knowing this, you won’t be able to plot two lines. In addition, the most critical VegaLite code of twoSets.html is “color”: {“field”: “set”, “type”: “nominal”}

The color block declares that you’re going to plot multiple lines based on the value of the set field. The nominal type of data, which is also known as categorica­l data, instructs VegaLite that you want to create separate plots based on the categories of the data. Although this tutorial won’t deal with data that represents dates and times, the “temporal” type should be used for parsing date and time values in your Vega

Lite code. You can view the output of twoSets.html using your usual browser. Plotting more than two data sets on the same output should be easy now and is left as an exercise for the reader. Pan and zoom Adding panning and zooming capabiliti­es to the Vega Lite output is straightfo­rward. The name of the HTML page with the embedded VegaLite code is called zoom. html. The most important part of the VegaLite code in the HTML file is the following: “mark”: “circle”, “x”: { “field”: “Weight”, “type”: “quantitati­ve”, “scale”: {“domain”: [125, 150]} } “color”: { “value” :”red” }

There are many things happening here. The value of mark specifies that you want to draw each point as a circle. The value of scale is “domain”: [125, 150], which means that the default output will only show points with x values between 125 and 150. If the value of scale is null then scaling will be disabled in the output. The same rule applies for the “y” block, which isn’t shown here but can be found in zoom.html. Finally, the color block defines the colour of each point in the plane.

Apart from the zooming and panning capabiliti­es of the presented graph, zoom.html illustrate­s a technique that enables you to assign a value to each point that’s plotted, which provides additional informatio­n in the output.

Notice that if you remove the selection block as well as the two scale blocks from zoom.html, the zooming and panning capabiliti­es of the graph will be disabled. If you just remove the selection block, you still won’t be able to zoom and pan, and furthermor­e the output will look clunky. The output of the zoom.html HTML page can be seen on the previous page ( belowleft).

This article has discussed many interestin­g and handy topics related to VegaLite. However, both Vega and Vega Lite grammars have many more capabiliti­es than what’s been shown in the limited space of a tutorial. As is usually the case with programmin­g languages of any type, the only way to learn such practical tools is to experiment on your own and make your own mistakes. So start using Vega and VegaLite to create stunning and profession­al output!

 ??  ?? This shows the output from the barChart.html HTML file and illustrate­s a way for creating a chart that has two bars in each category.
This shows the output from the barChart.html HTML file and illustrate­s a way for creating a chart that has two bars in each category.
 ??  ?? The left bar chart was created with the Vega grammar; the right with Vega Lite. The code of the Vega Lite files is much simpler and smaller.
The left bar chart was created with the Vega grammar; the right with Vega Lite. The code of the Vega Lite files is much simpler and smaller.
 ??  ?? is the author of Go Systems Programmin­g and Mastering Go. You can reach him at www. mtsoukalos.eu and @mactsouk.
is the author of Go Systems Programmin­g and Mastering Go. You can reach him at www. mtsoukalos.eu and @mactsouk.
 ??  ?? This figure demonstrat­es the output of the gridLines.html file, which teaches you how to add grid lines in a Vega Lite plot.
This figure demonstrat­es the output of the gridLines.html file, which teaches you how to add grid lines in a Vega Lite plot.
 ??  ?? Figure 4: This Figure shows the output of zoom.html that showcases the zooming and panning capabiliti­es provided by the Vega Lite grammar.
Figure 4: This Figure shows the output of zoom.html that showcases the zooming and panning capabiliti­es provided by the Vega Lite grammar.
 ??  ?? This graph presents a visual representa­tion of the labels.html file, where you can learn how to put labels in each bar of a bar chart using Vega Lite.
This graph presents a visual representa­tion of the labels.html file, where you can learn how to put labels in each bar of a bar chart using Vega Lite.

Newspapers in English

Newspapers from Australia