Linux Format

Building calendars in R

Mihalis Tsoukalos explains how to work with dates and times in R, along with a touch of Python to draw impressive calendars.

- Mihalis Tsoukalos is a data engineer and a technical writer. He is also the author of Go Systems Programmin­g and Mastering Go, 2nd edition. You can reach him at www. mtsoukalos.eu and @mactsouk.

Mihalis Tsoukalos explains that these aren’t calendars for pirates and shows you how to work with dates and times in R, along with Python to draw calendars.

The subject of this tutorial is creating calendars using R. The R language of statistica­l computing is a free implementa­tion of the S language. This tutorial will show you how to use a Docker image to execute Rstudio and how to work with dates and times in R, before explaining how to create calendars – being able to work with dates and times, and knowing how to extract the desired parts from a date string or structure is really important for defining the calendar you want.

We’ll then move on toexplanin­g how to use Python 3 to create a calendar heat map with random data.

Installing R

You can install R using your favourite package manager or by following the instructio­ns found in the R FAQ (https://cran.r-project.org/doc/faq/r-faq.html). As you will see, you can also run R and Rstudio using a Docker image.

You can enter the R shell by executing R on your favourite terminal. You can find the version of R you are

using by executing the R --version command, provided that the directory of the R binary is in your PATH environmen­t variable. The output will be similar to the following below:

R version 3.6.1 (2019-07-05) -- “Action of the Toes” Copyright (C) 2019 The R Foundation for Statistica­l

Computing

Platform: x86_64-pc-linux-gnu (64-bit) ...

If you have time and want to see some of the capabiliti­es of R, you can execute the demo() and help() commands from the R shell. Lastly, the demo(graphics) command will present a demo of the graphical capabiliti­es of R. Figure 1 shows a screen from the output of demo(graphics).

Using a Docker image

Nowadays, almost all software can be run as a container using a Docker image, and R is no exception. You can get the official R Docker image by executing docker pull r-base – find more informatio­n about that Docker image by visiting https://hub.docker.com/_/r-base. However, this tutorial will use the rocker/rstudio:latest Docker image that contains R as well as Rstudio. You can find more informatio­n about it at https://hub.docker. com/r/rocker/rstudio.

For reasons of simplicity, we are going to use a docker-compose.yml file to execute the Rstudio Docker image. There will be a shared volume between the local

machine and the Docker image that will enable the running container to see the R scripts that we are going to create without any extra steps. Alternativ­ely, you can copy and paste the R code into Rstudio.

Although you can do the same using additional parameters in the docker run command, the dockercomp­ose.yml configurat­ion file makes things much simpler – the contents of which will be the following: version: ‘2’

services: r: image: rocker/rstudio:latest container_name: r-project restart: always environmen­t:

- PASSWORD=LXF ports:

- 8787:8787

volumes:

- $Home/public/r/scripts:/data

Please store docker-compose.yml in its own directory for reasons of simplicity and efficiency. After that you can start the Docker image by running dockercomp­ose up and stop it by running docker-compose down from inside that directory. The Rstudio username is rstudio whereas the password is what you put in the PASSWORD field in docker-compose.yml – in this case LXF. The volumes part allows us to associate a directory on the local machine ($Home/public/r/scripts) with a directory in the running container (/data). Everything you put in $Home/public/r/scripts will be visible in the running container. Lastly, the port number to connect to the web interface of Rstudio will be 8787 – this is defined in the Ports section of docker-compose.yml.

Note that transferri­ng that docker-compose.yml to any other computer with any operating system that supports Docker will create exactly the same environmen­t for you to work in – the only thing that might change is the path of the shared volume, which is an optional parameter. You can connect to the Docker image by executing docker exec -it r-project bash.

Hot dates in R

R offers the built-in Date, POSIXLT and POSIXCT classes for storing dates and times, as well as the zoo and chron packages. This tutorial will only use the built-in types of R. The general principle is that if you are using a nonstandar­d format, you will have to specify the format you are using. Moreover, R allows you to do calculatio­ns with dates, which can be very practical.

The POSIXLT (lt:local time) class allows you to easily extract specific components out of a time because it uses a list with separate vectors for storing the year, month, day of week, day within the year, etc. It is important to remember that POSIXLT objects, which are lists, are not continuous variables.

POSIXCT (ct: calendar time) is the best class when you have times in your data; among other things, the class enables you to specify the time zone of a date.

POSIXCT keeps the epoch time, which means that it holds the number of seconds passed since 00:00:00 UTC Thursday 1 January 1970, which also means that

POSIXCT variables are continuous variables. In practice, this means that if you want to make statistica­l calculatio­ns that involve dates and times, using POSIXCT variables is the right choice.

If you only have dates in your data, then you should use the Date class. Type help(datetimecl­asses) to get more informatio­n about date and time classes.

Additional­ly, the strptime(), AS.POSIXCT() and

AS.POSIXLT() functions allow you to convert a factor or a string into a date – the user must provide a format statement in double quotes to inform R about the structure of the input. The difftime() function can also help you find the difference between two dates.

It is very important to use the correct code (format component) when parsing dates and times in R: %Y is for four-digit years, whereas %y is for two-digits years. Use %d to declare the day of the month, and use %m for declaring the month as a decimal number. Use %B as the code for the full name of a month, and then %b for the abbreviate­d name of a month. It is both necessary and wise to try things using small samples before working with real data, especially when dealing with dates and times, where it is easy to make small typos that can create big errors.

It is now time to see some of the aforementi­oned functions in action. Firstly, we will begin by presenting some simple R commands that allow you to get the current date and time:

> Sys.time()

[1] “2019-12-17 18:58:58 UTC”

> Sys.date()

[1] “2019-12-17”

> format(sys.date(), “%Y”)

[1] “2019”

> format(sys.date(), “%B”)

[1] “December”

The first command prints the current date and time, whereas the second command prints the current date only. The third and fourth commands print the current year using four digits and the full name of the current month, respective­ly. Note that Sys.date() returns an object of class Date whereas Sys.time() returns an object of class POSIXCT.

Another useful capability is being able to extract the month and the year out of a date that is stored in the Epoch format – displaying dates and times in the UNIX Epoch format has no practical meaning for the people that are going to see a calendar or another kind of output. The trick here is to convert the Epoch time into a valid R object. The same can happen with a date and time pair that is given as a string (such as 28 October 2019 18:30). However, in the latter case you will have to define the format of the input string for R to convert it into a valid object using format components. Both cases are illustrate­d in the next interactio­n with the R shell:

> ep <- 1576610461

> as.posixct(ep, origin=”1970-01-01”)

[1] “2019-12-17 19:21:01 UTC”

> datetime = “17/12/2019 21:15”

> mydate <- as.posixct(datetime, format=”%d/%m/%y %H:%M”)

> mydate

[1] “2019-12-17 21:15:00 UTC”

The codes in the format string specificat­ion (%d/%m/%y %H:%M) should represent the format of the string that is going to be converted into a POSIXCT

object. If these are not correct, you will most likely get an error message or, if you are very unlucky, an incorrect value for your POSIXCT object.

The following example shows how you can use the difftime() function:

> d1 <- as.date(“2018-12-22”)

> d2 <- as.date(“2019-12-18”)

> difftime(d2, d1)

Time difference of 361 days

> difftime(d2, d1, units=”mins”)

Time difference of 519840 mins

The first difftime() function call shows the difference between two dates in days, whereas the second one shows the same difference in minutes. Other valid units are auto, secs, hours and weeks. Therefore, if you want to calculate the difference between the current time and Christmas 2020 in seconds you will need to execute difftime(as.date(“2020-12-25”), Sys.time(), units=”secs”).

Figure 2 (see page 92) illustrate­s some of the daterelate­d functions of R using the Rstudio environmen­t for interactin­g with R.

Creating your first calendar

The following R code, which is saved as calendar.r, can help you create your first calendar with the help of the

ggcal R package: library(ggplot2) library(ggcal) date_range <- seq(as.date(“2020-01-01”), as.date(“202001-31”), by=”1 day”) fills <- rnorm(length(mydate)) print(ggcal(date_range, fills))

Note that in order to install ggcal, you will need to execute the devtools::install_github(“jayjacobs/ggcal”)

command in R or Rstudio, provided that you have already installed the devtools R package.

Let us talk about what the previous R code does. The first part of calendar.r is about loading the required R packages, which in this case are just ggplot2 and ggcal. Then, you can define the range of dates that you want to get in your plot using the date_range variable. Next, define the colours of days of the month – in this case using a Normal Distributi­on with the help of the norm()

function. Print the calendar using the print(ggcal(date_ range, fills)) statement. The presented code offers simplicity in exchange for a lack of customisat­ion.

Figure 3 (see page 93) shows the graphical output of the previous R commands. The default output of ggcal is pretty naive – the good thing is that it enables you to define the range of dates that you want to get on your screen. Additional­ly, it colours each day of the month with a pleasant colour.

The next version, which is saved as weekdays.r, will create an output where you can easily differenti­ate between weekdays and weekends: library(ggplot2) library(ggcal) date_range <- seq(as.date(“2020-03-01”), as.date(“202001-31”), by=”1 day”) fills <- ifelse(format(date_range, “%w”) %in% c(0,6),

“weekend” ,“weekday”) ggcal(date_range, fills) + scale_fill_manual(values=c(“w eekday”=”darkgray”, “weekend”=”darkgreen”))

The magic happens at the definition of the fills variable: > fills

[1] “weekday” “weekday” “weekday” “weekend” “weekend” “weekday” “weekday” “weekday”

“weekday” “weekday” “weekend” 12] “weekend” “weekday” “weekday” “weekday”

“weekday” “weekday” “weekend” “weekend”

“weekday” “weekday” “weekday”

From this, each date of the month is either characteri­sed as weekday or weekend using the R %in%

operator. Before that, each day of the week is given a number from 0 to 6 using format(date_range, “%w”).

Weekends are the days of the week with a number of 0 or 6 – this is what %in% catches and characteri­ses as

weekend. All the other days of the month are characteri­sed as weekday. The scale_fill_manual(values =c(“weekday”=”darkgray”, “weekend”=”darkgreen”)) statement makes sure that weekend and weekday days are coloured differentl­y.

Customisin­g your calendar

Now it’s time to print dates on the calendar using the code of dates.r. The most important part of dates.r is the following: dates <- data.frame(date=seq(as.date(‘2020-01-01’),as.

Date(‘2020-04-30’),by=1)) dates$month <- factor(strftime(dates$date,format=”%b

”),levels=c(“january”,“february”,“march”,“april”))

The dates variable is where you define the range of dates that is going to be included in the output, whereas the dates$month column is where you define the names of the months that interest you.

Figure 4 (see page opposite) shows the output of the previous commands. Although the generated output here is much more impressive than the output that’s shown in Figure 3, you will need to type more data and make more modificati­ons to the R code. However, if all that you need is a simple calendar, then you will only need to change the definition of the dates and dates$month variables. Note that dates.r uses the

ggplot2 R package only.

Python heat maps

A heat map is a smart way of visualisin­g a table of numbers, where you substitute the real values with coloured cells. You can easily understand a heat map if you think of it as a table or spreadshee­t that contains colours instead of numbers.

Let’s look at a Python 3 script that creates a calendar heat map. The Python 3 code is saved in heatmap.py. The most important Python 3 code is the following:

I = 1.1 - np.cos(x.ravel()) + np.random.normal(0,.2, X. size) calmap(ax, 2020, I.reshape(53,7).t)

I = 1.1 - np.cos(x.ravel()) + np.random.normal(0,.2, X. size)

calmap(ax, 2021, I.reshape(53,7).t)

Each calmap() entry correspond­s to a year that is going to be plotted. The I variable contains random data that is going to be plotted as a heat map. Each year has its own definition of the I variable.

Figure 5 (see below) shows the graphical output of heatmap.py. Note that as heatmap.py uses randomly generated data, which means that you will get a different output each time you execute it.

In the files that come with this tutorial you are going to find one named heatmap.r, which illustrate­s how you can create a heat map in a R calendar. However, the R implementa­tion of the calendar heat map is much more complex than the Python 3 version. Always use the best tool for the job.

What you should keep from this tutorial – apart from the fact that R is not just about statistics – is the presented R scripts and R code, which you can take away, modify and experiment with to produce the kind of calendars you want.

 ??  ?? Figure 1: Output of the demo(graphics) command as executed in the Rstudio environmen­t that was installed and executed as a Docker image.
Figure 1: Output of the demo(graphics) command as executed in the Rstudio environmen­t that was installed and executed as a Docker image.
 ??  ?? Figure 2: How to work with times, dates and locales in R, how to convert from epoch time and output of the help(datetimecl­asses) command.
Figure 2: How to work with times, dates and locales in R, how to convert from epoch time and output of the help(datetimecl­asses) command.
 ??  ??
 ??  ?? Figure 3: A first try to create a calendar for January 2020 using ggcal and ggplot2. Although the output is not very sophistica­ted, it does its job pretty well.
Figure 3: A first try to create a calendar for January 2020 using ggcal and ggplot2. Although the output is not very sophistica­ted, it does its job pretty well.
 ??  ?? Figure 4: This is a much prettier and more profession­al calendar that was created using the capabiliti­es of ggplot2 and the code of dates.r.
Figure 4: This is a much prettier and more profession­al calendar that was created using the capabiliti­es of ggplot2 and the code of dates.r.
 ??  ?? Figure 6: This shows a small part of the output of the install. packages(“tidyverse”) command. As tidyverse is in reality a meta package, it will install lots of R packages, including ggplot2.
Figure 6: This shows a small part of the output of the install. packages(“tidyverse”) command. As tidyverse is in reality a meta package, it will install lots of R packages, including ggplot2.
 ??  ?? Figure 5: Generated by the heatmap.py Python 3 script. Although R can also generate calendar heat maps, the Python 3 code is much simpler.
Figure 5: Generated by the heatmap.py Python 3 script. Although R can also generate calendar heat maps, the Python 3 code is much simpler.

Newspapers in English

Newspapers from Australia