Vega Lite graph­ing

Don’t let your data sit in a lowly spread­sheet. Mi­halis Tsouka­los re­veals how to use Vega and Vega Lite to cre­ate im­pres­sive graphs and plots.

Linux Format - - Contents - Mi­halis Tsouka­los

Mi­halis Tsouka­los teaches you how to use Vega and Vega Lite to cre­ate im­pres­sive graphs and plots!

Let’s take a look at Vega and its lighter ver­sion,

Ve­gaLite. Both are gram­mars that are used for visu­al­i­sa­tion tasks. Strictly speak­ing, Ve­gaLite is a gram­mar that’s based on Vega, just as LaTeX is based on TeX. The main ad­van­tage of us­ing Ve­gaLite is that it in­volves writ­ing less code than Vega.

Us­ing a gram­mar for visu­al­i­sa­tion has both its ad­van­tages and dis­ad­van­tages. The big­gest dis­ad­van­tage is that you’ll have to learn the rules and the re­stric­tions of that gram­mar be­fore you can use it. On the other hand, once you learn the gram­mar, you’ll work faster and be more pro­duc­tive be­cause you’ll only need to use the rel­e­vant el­e­ments of the gram­mar to de­scribe your plots. Ad­di­tion­ally, the use of a gram­mar makes code mod­i­fi­ca­tions much eas­ier be­cause changing a sin­gle line can al­ter the look of the en­tire out­put. Fi­nally, the use of a gram­mar means that it’ll be eas­ier to iden­tify mis­takes in your code.

Be­cause the code of the Ve­gaLite gram­mar is more com­pact, most of the ex­am­ples of this tu­to­rial will be based on the gram­mar of Ve­gaLite. So, with that in mind, let’s get started!

Viva Las Vega!

Both Vega and Ve­gaLite are writ­ten in JavaScript and both of them im­ple­ment a gram­mar that of­fers a con­ve­nient way for cre­at­ing graphs and plots, which also means that you should fol­low cer­tain rules and use cer­tain key­words on your Vega files. Both gram­mars are bet­ter for visu­al­is­ing tab­u­lar data and can be used as a file for­mat for cre­at­ing and shar­ing data and vi­su­al­i­sa­tions. More­over, both Vega types use the JSON file for­mat for stor­ing their code, but as you’ll soon see, you can em­bed Vega code in HTML pages.

A doc­u­ment in ei­ther one of the two gram­mars spec­i­fies the data source, which is the place to find the data and op­tion­ally any data trans­for­ma­tions. It then spec­i­fies the type of the out­put and the map­pings be­tween the data and the en­cod­ing chan­nels, which is the en­cod­ing of the data. This is when you can add colour, grids, le­gends and so on in your graphs. Note that all files for this tu­to­rial can be down­loaded so you won’t have to in­stall any­thing on your Linux ma­chine. How­ever, it’s also pos­si­ble to down­load the Vega gram­mar files lo­cally and use them from there.

Bear in mind that Vega is not meant to be a re­place­ment for the fa­mous D3 JavaScript li­brary ( https://d3js.org), which is a lower-level li­brary. In ad­di­tion, Vega and Ve­gaLite use D3 in their im­ple­men­ta­tions. D3 is more ca­pa­ble for visu­al­is­ing novel ideas whereas Vega might be more con­ve­nient for

al­most all other cases − de­pend­ing on the task you want to per­form.

If you re­ally want to use a script­ing lan­guage in­stead of plain Vega or Ve­gaLite code, con­sider the Al­tair Python pack­age at https://al­tair-viz.github.io.

Start­ing sim­ple

This sec­tion will present two code ex­am­ples to get you started with Ve­gaLite. Both ex­am­ples will plot data as bars. But first, let’s start with the data, which will be in the fol­low­ing for­mat: ColumnA A B ColumnB 2 1

This data needs to be rep­re­sented in a way that Vega will un­der­stand, which is the JSON for­mat. So, the pre­vi­ous data set should be rewrit­ten in JSON for­mat us­ing the data.val­ues fields as fol­lows: { “data”: { “val­ues”: [ {“cA”: “A”, “cB”: 2}, {“cA”: “B”, “cB”: 1} ] }}

Both Vega and Vega Lite can im­port data from ex­ter­nal files, which en­ables you to cre­ate dy­namic out­put and have smaller files. No­tice that al­though both

Vega and Ve­gaLite use the JSON for­mat by de­fault, they also sup­port ex­ter­nal data in the CSV (comma sep­a­rated val­ues) and TSV (tab sep­a­rated val­ues) for­mats us­ing the for­matType key­word. How­ever, for rea­sons of sim­plic­ity, all the ex­am­ples in this ar­ti­cle will use em­bed­ded data. The fol­low­ing code presents a sim­ple ex­am­ple of

Vega; how­ever, the con­tents of most of the blocks aren’t shown to save on space: { “$schema”: “https://vega.github.io/schema/vega/v3. json”, “width”: 800, “height”: 600, “data”: [ ], “scales”: [ ], “axes”: [ ], “marks”: [ ] }

The pre­ced­ing code is in JSON for­mat, which means that it should be saved with the .json file ex­ten­sion. So the name of the pre­ced­ing file is ve­g­aEx­am­ple.json.

The val­ues of “width” and “height” are im­por­tant be­cause they de­ter­mine the size of the out­put. Other im­por­tant key­words in­clude $schema and data, which de­fines the data set that will be used and marks that de­fines the type of the plot. You can find out the ver­sion of Vega used in this ex­am­ple by look­ing at the value of the $schema key­word. Al­though the Vega gram­mar en­ables you to change ev­ery lit­tle de­tail and op­tion, you have to write lots of code, which some­times isn’t ef­fi­cient. This is the main rea­son for the ex­is­tence of the

Ve­gaLite gram­mar, which is more con­ve­nient than plain Vega. The fol­low­ing code, which is a com­plete ex­am­ple, shows what Ve­gaLite pro­grams look like: { “$schema”: “https://vega.github.io/schema/ve­galite/v2.0.json”, “width” : 800, “height”: 600, “de­scrip­tion”: “Vega Lite for Linux For­mat.”, “data”: { “val­ues”: [ {“cA”: “A”,“cB”: 8}, {“cA”: “B”,“cB”: -5}, {“cA”: “C”,“cB”: 1}, {“cA”: “D”,“cB”: 1}, {“cA”: “E”,“cB”: 1}, {“cA”: “F”,“cB”: 5} ] }, “mark”: “bar”, “en­cod­ing”: { “x”: {“field”: “cA”,“type”: “or­di­nal”}, “y”: {“field”: “cB”,“type”: “quan­ti­ta­tive”} } }

The Ve­gaLite ver­sion is smaller and sim­pler than the plain Vega ver­sion and it’s saved as ve­g­aLiteEx­am­ple.

json. You can find out the ver­sion of Ve­gaLite used by look­ing at the value of the $schema key­word. The en­cod­ing key in­structs Ve­gaLite how to in­ter­pret the data and the parts of the data that will be used in the plot, which is par­tic­u­larly use­ful when your data el­e­ments have mul­ti­ple fields. In this case, the data for column x is taken from the cA column of the in­put data and is of the or­di­nal type. Anal­o­gously, the data for column y of the out­put is taken from the cB column of

the in­put data and is of the quan­ti­ta­tive type. The value of mark spec­i­fies the kind of graph you want to cre­ate. Have in mind that each Ve­gaLite source file is con­verted into the Vega gram­mar in­ter­nally.

The chart on the first page ( bot­tom­left) shows the out­put of both ex­am­ples; on the left is the out­put of ve­g­aEx­am­ple.json and on the right is the out­put of

ve­g­aLiteEx­am­ple.json. Both out­put im­ages were cre­ated by em­bed­ding the ex­it­ing code into an HTML page, which is the best way to present Vega and Vega

Lite out­put. The ve­g­aEx­am­ple.html page is used for pre­sent­ing ve­g­aEx­am­ple.json and ve­g­aLiteEx­am­ple.

html is used for pre­sent­ing ve­g­aLiteEx­am­ple.json. It’s con­sid­ered good prac­tice to put your Vega and Ve­gaLite code in HTML files from the be­gin­ning and not de­velop two ver­sions of each graph.

The fol­low­ing sec­tions will pro­vide even more in­ter­est­ing ex­am­ples, so keep read­ing!

At the bar

Cre­at­ing a colour­ful bar chart in Ve­gaLite is rel­a­tively sim­ple if you al­ready know the in­for­ma­tion from the pre­vi­ous sec­tion. Ad­di­tion­ally, the pre­sented bar chart will dis­play two bars for each cat­e­gory of data. As ex­pected, the HTML code, which is saved as barChart.

html, em­beds the Ve­gaLite code. The gen­er­ated out­put can be seen in the graph on the first page ( aboveright). As far as the Ve­gaLite code in the HTML file is con­cerned, its most im­por­tant part is the fol­low­ing: “trans­form”: [ {“cal­cu­late”: “da­tum.an­swer == 2 ? ‘NO’ : ‘YES’”, “as”: “an­swerNEW”} ], “mark”: “bar”, “color”: { “field”: “an­swerNEW”, “type”: “nom­i­nal”, “scale”: {“range”: [“#BBAACC”, “#AB5599”]} }

The da­tum.an­swer vari­able rep­re­sents the an­swer field of the in­put data be­cause the da­tum vari­able, which is au­to­mat­i­cally as­signed by Ve­gaLite, rep­re­sents the cur­rent data ob­ject. The trans­form block is used for ex­tend­ing data ob­jects with new fields. In this case, it in­forms Ve­gaLite that if the value of the da­tum.an­swer field is equal to 2, then it’ll cre­ate a new field called an­swerNEW that will have ei­ther NO or YES as its value. The value of mark spec­i­fies that you’ll cre­ate a bar chart. Fi­nally, the color block spec­i­fies the colour of each bar of the out­put de­pend­ing on the value of the an­swerNEW field, which makes the gen­er­ated out­put much more in­ter­est­ing.

Do­ing lines

Grid lines give your plots a pro­fes­sional look so both

Vega and Ve­gaLite make it pos­si­ble to in­clude grid lines into your out­put. The name of the HTML file that con­tains the re­lated Ve­gaLite code is gridLines.html and its most im­por­tant part is the fol­low­ing: “con­fig”: { “axis”: { “grid”: true, “gridColor”: “black”}, },

In the pre­ced­ing code you spec­ify that you want to have grid lines in the out­put as well as the colour of the grid lines. How­ever, you’re not fin­ished yet as you’ll have to add the fol­low­ing line of code in­side the en­cod­ing block that sets up the y-axis: “axis”: {“tick­Count”: 15, “tick­Size”: 20} },

The value of tick­Count spec­i­fies the num­ber of ticks you’ll have in the out­put. No­tice that us­ing a very large value for tick­Count might make your out­put dif­fi­cult to read. The graph on the pre­vi­ous page shows the out­put of the gridLines.html file – no­tice that gridLines.html is based on ve­g­aLiteEx­am­ple.html. If you’re re­ally into grid lines and Ve­gaLite, you should have a look at https://vega.github.io/vega-lite/docs/axis.html.

Cre­at­ing a line chart

Line charts are pop­u­lar for plot­ting large data sets. In this sec­tion you’ll learn how to plot points on the XY plane. The name of the HTML file that con­tains the Vega Lite code is lineChart.html. No­tice that Vega and Vega Lite au­to­mat­i­cally sort the data points in the out­put by their nu­mer­i­cal val­ues.

The most im­por­tant Ve­gaLite code of lineChart.html is the fol­low­ing ex­am­ple, be­cause this is where you de­fine the type of the chart you’ll pro­duce as well as how you’re go­ing to treat your in­put data: “mark”: “line”, “en­cod­ing”: { “x”: {“field”: “x”, “type”: “quan­ti­ta­tive”, “axis”: {“grid”: false, “tick­Size”: 20}}, “y”: {“field”: “y”, “type”: “quan­ti­ta­tive”, “axis”: {“tick­Count”: 15, “tick­Size”: 20} } }

The “mark”: “line” part spec­i­fies that you want to cre­ate a line chart. If you put “mark”: “point” in­stead, then you’ll pro­duce sin­gle points on your out­put that won’t be con­nected to each other – this op­tion might be more ap­pro­pri­ate when you have a large num­ber of data points to plot. The lineChart.html file also shows how you can dis­able grid lines on the x-axis only by us­ing “grid”: false.

This tech­nique also works when you only want to show grid lines on the x-axis. In ad­di­tion, re­mem­ber to use “type”: “quan­ti­ta­tive” on both x and y blocks to

sig­nify that both axis will have nu­meric val­ues. You can load the lineChart.html file into your pre­ferred browser and see the gen­er­ated out­put.

Two at a time

Let’s now dou­ble the fun and il­lus­trate how you can plot two data sets on the same plot, us­ing line charts us­ing the code of lineChart.html as its ini­tial ver­sion. The name of the HTML file that will be used in this sec­tion is

twoSets.html. First, take this small data sam­ple: {“set”: “John”, “x”: 11, “y”: 12}, {“set”: “Jane”, “x”: 4, “y”: 5},

What dif­fer­en­ti­ates the two data sets is the value of the “set” field. With­out hav­ing such a field and know­ing this, you won’t be able to plot two lines. In ad­di­tion, the most crit­i­cal Ve­gaLite code of twoSets.html is “color”: {“field”: “set”, “type”: “nom­i­nal”}

The color block de­clares that you’re go­ing to plot mul­ti­ple lines based on the value of the set field. The nom­i­nal type of data, which is also known as cat­e­gor­i­cal data, in­structs Ve­gaLite that you want to cre­ate sep­a­rate plots based on the cat­e­gories of the data. Al­though this tu­to­rial won’t deal with data that rep­re­sents dates and times, the “tem­po­ral” type should be used for pars­ing date and time val­ues in your Vega

Lite code. You can view the out­put of twoSets.html us­ing your usual browser. Plot­ting more than two data sets on the same out­put should be easy now and is left as an ex­er­cise for the reader. Pan and zoom Ad­ding pan­ning and zoom­ing ca­pa­bil­i­ties to the Vega Lite out­put is straight­for­ward. The name of the HTML page with the em­bed­ded Ve­gaLite code is called zoom. html. The most im­por­tant part of the Ve­gaLite code in the HTML file is the fol­low­ing: “mark”: “cir­cle”, “x”: { “field”: “Weight”, “type”: “quan­ti­ta­tive”, “scale”: {“do­main”: [125, 150]} } “color”: { “value” :”red” }

There are many things hap­pen­ing here. The value of mark spec­i­fies that you want to draw each point as a cir­cle. The value of scale is “do­main”: [125, 150], which means that the de­fault out­put will only show points with x val­ues be­tween 125 and 150. If the value of scale is null then scal­ing will be dis­abled in the out­put. The same rule ap­plies for the “y” block, which isn’t shown here but can be found in zoom.html. Fi­nally, the color block de­fines the colour of each point in the plane.

Apart from the zoom­ing and pan­ning ca­pa­bil­i­ties of the pre­sented graph, zoom.html il­lus­trates a tech­nique that en­ables you to as­sign a value to each point that’s plot­ted, which pro­vides ad­di­tional in­for­ma­tion in the out­put.

No­tice that if you re­move the se­lec­tion block as well as the two scale blocks from zoom.html, the zoom­ing and pan­ning ca­pa­bil­i­ties of the graph will be dis­abled. If you just re­move the se­lec­tion block, you still won’t be able to zoom and pan, and fur­ther­more the out­put will look clunky. The out­put of the zoom.html HTML page can be seen on the pre­vi­ous page ( be­lowleft).

This ar­ti­cle has dis­cussed many in­ter­est­ing and handy top­ics re­lated to Ve­gaLite. How­ever, both Vega and Vega Lite gram­mars have many more ca­pa­bil­i­ties than what’s been shown in the lim­ited space of a tu­to­rial. As is usu­ally the case with pro­gram­ming lan­guages of any type, the only way to learn such prac­ti­cal tools is to ex­per­i­ment on your own and make your own mis­takes. So start us­ing Vega and Ve­gaLite to cre­ate stun­ning and pro­fes­sional out­put!

This shows the out­put from the barChart.html HTML file and il­lus­trates a way for cre­at­ing a chart that has two bars in each cat­e­gory.

The left bar chart was cre­ated with the Vega gram­mar; the right with Vega Lite. The code of the Vega Lite files is much sim­pler and smaller.

is the au­thor of Go Sys­tems Pro­gram­ming and Mas­ter­ing Go. You can reach him at www. mt­souka­los.eu and @mact­souk.

This fig­ure demon­strates the out­put of the gridLines.html file, which teaches you how to add grid lines in a Vega Lite plot.

Fig­ure 4: This Fig­ure shows the out­put of zoom.html that show­cases the zoom­ing and pan­ning ca­pa­bil­i­ties pro­vided by the Vega Lite gram­mar.

This graph presents a visual rep­re­sen­ta­tion of the la­bels.html file, where you can learn how to put la­bels in each bar of a bar chart us­ing Vega Lite.

Newspapers in English

Newspapers from Australia

© PressReader. All rights reserved.