An Introduction to R
Welcome to a new series on ‘R, statistics and machine learning’. R is a programming language that was primarily designed for statistical computing and graphics. It is a multi-paradigm programming language that supports an imperative, objectoriented, array and functional style of programming. R is dynamically typed and is primarily written in C, Fortran and R itself.
Ris an official GNU package and is released under the GNU GPL v2 licence. It was first released in 1993 and the latest stable release is 4.0.4. The official home page of the R project is https://www.r-project.org/. In this new series of articles, we will explore the syntax, semantics of R and also the various libraries available for statistics, graphics and machine learning.
Installation
Parabola GNU/Linux-libre: You can install R on Parabola GNU/Linux-libre using the Pacman package manager, as shown below:
$ sudo pacman -S r
The latest version that gets installed is 4.0.4-1, as indicated below: extra/r 4.0.4-1 [installed] Language and environment for statistical computing and graphics
Debian/Ubuntu: The ‘r-base’ package needs to be installed on Ubuntu to get R in your system: $ sudo apt install r-base
Fedora: The latest R version can be installed on Fedora using: $ sudo dnf install R
Mac OS X: The ‘R.APP’ application can be installed from https://mac.r-project.org/ for Mac OS X. The website provides both the -devel and -stable releases for installation. Periodic nightly builds are made for the R releases with a
.pkg file. Please note that these releases for Mac OS X are still experimental in nature.
Windows: The ‘bin/windows/base’ directory at https://cran.r-project.org/mirrors.html provides an R-4.0.4win.exe executable for R on Windows. If you like to test the latest software, you can install the ‘r-patched’ or ‘r-devel’ snapshot releases as well. R on Windows is supported from Windows 7 or later, and the installation takes at least 150MB of disk space.
Emacs: As an Emacs user, you can install the ‘Emacs Speaks Statistics’ (ESS) package that provides support for working on R source files. The add-on includes syntax highlighting, code formatting, searching for documentation, displaying results, etc. The project website is available at
https://ess.r-project.org/. With a Cask setup, you can simply add the following to your Cask file to install ESS: (depends-on “ess”)
You can also execute R code in an Emacs Org Babel code block. The following needs to be added to your Emacs configuration file:
(org-babel-do-load-languages ‘org-babel-load-languages ‘((emacs-lisp . t) (R . t)))
Consider the given code snippet in an Emacs Org file. When you use C-c C-c in the code block, it will execute the commands in an R environment and produce the result:
#+BEGIN_SRC R sqrt(2) #+END_SRC #+RESULTS: : 1.4142135623731
Usage
On Parabola GNU/Linux-libre, open a terminal and type ‘R’ at the shell prompt to invoke the R interpreter as shown below:
$ R R version 4.0.4 (2021-02-15) -- “Lost Library Book” Copyright (C) 2021 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type license() or licence() for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors. Type ‘contributors()’ for more information and ‘citation()’ on how to cite R or R packages in publications.
Type demo() for some demos, help() for on-line help, or help.start() for an HTML browser interface to help. Type ‘q()’ to quit R. >
You can type q() at the prompt to exit from the session. It will then ask you if you would like to save the workspace image and you can either press y or n.
> q() Save workspace image? [y/n/c]: n $
You can obtain the version of R that is installed from the terminal prompt using the R –version command, as shown below:
$ R --version R version 4.0.4 (2021-02-15) -- “Lost Library Book” Copyright (C) 2021 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under the terms of the GNU General Public License versions 2 or 3. For more information about these matters see https://www.gnu.org/licenses/.
If you are at the R prompt, you can obtain the version information with the ‘version’ built-in as follows:
> version
_ platform x86_64-pc-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 4 minor 0.4 year 2021 month 02 day 15 svn rev 80002 language R version.string R version 4.0.4 (2021-02-15) nickname Lost Library Book
There is also built-in help documentation that you can use with the ‘help’ function as mentioned below: > help() help package:utils R Documentation
Documentation
Description:
‘help’ is the primary interface to the help systems.
Usage:
help(topic, package = NULL, lib.loc = NULL, verbose = getOption(“verbose”),
try.all.packages = getOption(“help.try.all. packages”), help_type = getOption(“help_type”))
Arguments:
topic: usually, a name or character string specifying the topic for which help is sought. A character string (enclosed in explicit single or double quotes) is always taken as naming a topic.
If the value of ‘topic’ is a length-one character vector the topic is taken to be the value of the only element. Otherwise ‘topic’ must be a name or a reserved word (if syntactically valid) or character string.
See ‘Details’ for what happens if this is omitted.
...
You can search for specific help using the help.search function, as shown below:
> help.search(“histogram”)
Help files with alias or concept or title matching ‘histogram’ using fuzzy matching:
graphics::hist Histograms graphics::hist.POSIXt Histogram of a Date or Date-Time Object graphics::plot.histogram Plot Histograms Aliases: plot.histogram, lines.histogram grDevices::nclass.Sturges Compute the Number of Classes for a
Histogram KernSmooth::dpih Select a Histogram Bin Width lattice::histogram Histograms and Kernel Density Plots Aliases: histogram, histogram.factor, histogram.numeric, histogram.formula lattice::panel.histogram Default Panel Function for histogram Aliases: panel.histogram lattice::prepanel.default.bwplot Default Prepanel Functions Aliases: prepanel.default.histogram MASS::hist.scott Plot a Histogram with Automatic Bin Width Selection MASS::ldahist Histograms or Density Plots of Multiple Groups
MASS::truehist Plot a Histogram
Type ‘?PKG::FOO’ to inspect entries ‘PKG::FOO’, or ‘TYPE?PKG::FOO’ for entries like ‘PKG::FOO-TYPE’.
The information on operators (arithmetic, for example) can be obtained with the question mark symbol followed by the operator, enclosed within back quotes as illustrated below:
> ?`%%`
Arithmetic Documentation package:base R
Arithmetic Operators
Description:
These unary and binary operators perform arithmetic on numeric or complex vectors (or objects which can be coerced to them).
Usage:
+x - x x+y x-y x*y x/y x^y x %% y x %/% y
Arguments:
x, y: numeric or complex vectors or objects which can be coerced to such, or other objects for which methods have been
written.
The ‘base’ package in R comes with a lot of demos that you can try out from the R console. You can list them using the demo function:
> demo() Demos in package ‘base’: error.catching More examples on catching and handling
errors is.things Explore some properties of R objects and is.FOO() functions. Not for newbies! recursion Using recursion for adaptive integration scoping An illustration of lexical scoping.
Demos in package ‘graphics’: Hershey Tables of the characters in the Hershey vector fonts Japanese Tables of the Japanese characters in the Hershey vector fonts graphics A show of some of R’s graphics capabilities image The image-like graphics builtins of R persp Extended persp() examples plotmath Examples of the use of mathematics annotation
Demos in package ‘grDevices’: colors A show of R’s predefined colors() hclColors Exploration of hcl() space
...
The following is an example of a rotated sinc function:
> demo(persp)
demo(persp) ---- ~~~~~
Type
> require(grDevices); require(graphics)
> ## (1) The Obligatory Mathematical surface. > ## Rotated sinc function.
It produces the graphical output shown in Figure 1. If you would like to see example code from R’s online documentation, you can use the ‘example’ function. For
instance, different shades of blue can be seen from the colours example illustrated below:
> example(colors)
colors> cl <- colors()
colors> length(cl); cl[1:20] [1] 657 [1] “white” “aliceblue” “antiquewhite” “antiquewhite1” [5] “antiquewhite2” “antiquewhite3” “antiquewhite4” “aquamarine” [9] “aquamarine1” “aquamarine2” “aquamarine3” “aquamarine4” [13] “azure” “azure1” “azure2” “azure3” [17] “azure4” “beige” “bisque” “bisque1”
colors> length(cl. <- colors(TRUE)) [1] 502
colors> ## only 502 of the 657 named ones colors> colors> ## ----------- Show all named colors and more: colors> demo(“colors”)
demo(colors) ---- ~~~~~~
Type