Webservers Varnish cache...................
Mihalis Tsoukalos teaches you how to install and configure the Varnish HTTP cache to speed up your website.
Mihalis Tsoukalos explains how to install and set up the Varnish HTTP Cache, so your websites never fall over, probably.
Varnish is a caching HTTP reverse proxy that acts as a transparent layer between your web server and its web clients. It operates by caching content in order to deliver it faster without the web client having to wait for the web server. In other words, Varnish cannot operate on its own, as it needs to be in front of a properly configured web server. The most significant advantage of Varnish is that it makes the response time of your sites faster.
This tutorial will showcase the basic Varnish functionality; however, Varnish can do many more things than those presented here. And although it can work with Nginx, this tutorial will use an Apache server – using Nginx [see tutorials, LXF222] instead of Apache is not difficult provided that you know how to make changes to the Nginx configuration. You can find a lot more information about Varnish by visiting the official site at https://varnish-cache.org. You can see a part of that site in the screen capture on this page (top right).
One warning in particular: it is highly recommended that you carry out all your Varnish experiments using a test machine or a virtual machine, because a wrongly configured Varnish process can limit the accessibility of your web sites by the rest of the world.
Fully varnished
Installing Varnish on a Debian machine is as simple as executing the following command with root privileges: # apt-get install varnish
You can find out the version of Varnish you are using by executing the next command: # varnishd -V varnishd (varnish-4.0.2 revision bfe7cd1) Copyright (c) 2006 Verdens Gang AS Copyright (c) 2006-2014 Varnish Software AS
The Varnish installation includes many executables: varnishadm,varnishhist,varnishncsa,varnishtest,varnishd, varnishlog,varnishstat and varnishtop.
The main Varnish configuration directory is /etc/varnish but there are also other Varnish related files inside /etc/ default, as can be seen: $ ls -l /etc/default/varnish* -rw-r--r-- 1 root root 3768 Oct 14 2014 /etc/default/varnish -rw-r--r-- 1 root root 514 Oct 14 2014 /etc/default/varnishlog -rw-r--r-- 1 root root 799 Oct 14 2014 /etc/default/ varnishncsa
The most important configuration files among them are /etc/varnish/default.vcl and /etc/default/varnish because these are the two places where the more critical parameters of Varnish are defined. The single most critical
parameter of a running Varnish process is the size of its cache, because the size of the cache affects performance. However, this tutorial will stick with the default cache size.
Basic varnishing
You can start Varnish as follows: # service varnish start # service varnishlog start # service varnishncsa start
Similarly, you can stop all Varnish processes as follows: # service varnish stop # service varnishlog stop # service varnishncsa stop
Therefore, if you execute ps ax | grep varnish after starting Varnish, you will see at least three distinct processes running: the varnishd process that does all the work for supporting the Varnish functionality, the varnishlog service that executes the varnishlog utility as a service in order to take care of the Varnish log files and the varnishncsa service that is similar to the varnishlog service but writes its log files in the NCSA Common Log Format.
You can easily find out the way the varnish service ( varnishd) was executed: $ ps ax | grep varnishd | head -1
825 ? Ss 0:00 /usr/sbin/varnishd -j unix,user=vcache -F -a :6081 -T localhost:6082 -f /etc/varnish/default.vcl -S / etc/varnish/secret -s malloc,256m
So, the previous Varnish instance uses port number 6081 (- a :6081), which is mainly used for testing purposes. In order for Varnish to transparently take control of the web traffic of your web site, Varnish should listen to the TCP port previously owned by the web server and the web server must choose another available port. In other words, Varnish will be the process handling all HTTP requests.
As a result, it is now time to change the current configuration on your Linux server and make Varnish listen to TCP port 80 and the Apache web server listen to TCP port 8080, which was chosen because it is easy to remember. On a Debian Linux system, you will first need to make two changes in two Varnish configuration files as shown in the following output of the diff utility: # cd /etc/default/ # diff varnish varnish.orig 48c48 < DAEMON_OPTS="-a :80 \ --> DAEMON_OPTS="-a :6081 \
# cd /etc/varnish/ # diff default.vcl default.vcl.orig 18c18 < .port = “8080”; --> .port = “8888”;
The first change tells Varnish to use port number 80 whereas the second change tells Varnish that its client, which in this case is Apache, listens to TCP port 8080. Additionally, you will need to tell all Apache Virtual Sites that used to listen to port number 80 to listen to TCP port 8080 from now on. This also requires a change to the value of the Listen parameter of the /etc/apache2/ports.conf file. If you have other Apache virtual hosts that do not use TCP port 80, you should not make any changes to their configuration!
Last, you will also need to perform the following actions: # cp /lib/systemd/system/varnish.service /etc/systemd/ system # vi /etc/systemd/system/varnish.service # diff /lib/systemd/system/varnish.service /etc/systemd/ system/varnish.service
9c9 < ExecStart=/usr/sbin/varnishd -a :6081 -T localhost:6082 -f / etc/varnish/default.vcl -S /etc/varnish/secret -s malloc,256m --> ExecStart=/usr/sbin/varnishd -a :80 -T localhost:6082 -f / etc/varnish/default.vcl -S /etc/varnish/secret -s malloc,256m # systemctl daemon-reload # systemctl restart apache2.service # systemctl restart varnish.service
The last two commands are used for restarting the two services. You should first restart Apache in order to free TCP port 80 and then Varnish, which will bind TCP port 80. So, the next question is “How can I be sure that I am using Varnish?” The answer is pretty simple: you can try accessing one of the Apache sites and see if it works!
Additionally, you can check the log files of Varnish, one of its various utilities or the output of the next command: # ps ax | grep varnishd | grep -v grep | tail -1 7883 ? Sl 0:01 /usr/sbin/varnishd -a :80 -T localhost:6082 -f /etc/varnish/default.vcl -S /etc/varnish/ secret -s malloc,256m
If the value of the -a parameter is :80, then Varnish is properly configured.
Please keep in mind that giving Varnish too much disk space or RAM might not result in better performance because it will make the searching the cache slower. Depending on your data, a disk space of around 4GB might be a good starting point. Finding the optimum cache size is an art that requires a constant tracking of Varnish performance with the help of tools such as varnishstat and
varnishtop. Last, you should understand that the entire process is transparent as far as the web clients are concerned, which is extremely important.
Please keep in mind that it is very handy to keep a backup copy of each configuration file you modify because sometimes things can go wrong. In this tutorial the original files have the .orig extension.
Varnish logs
On a Debian system, the log files of Varnish can be found inside the folder /var/log/varnish/. There are actually two log files, named varnish.log and varnishncsa.log. The varnish.log file keeps Varnish log data in a format defined by Varnish, which is more readable but takes more screen space, whereas varnishncsa.log uses the familiar log format used in Apache log files.
The screen capture image on the previous page shows data from both varnish.log and varnishncsa.log in order to give you a better understanding of the kind of data stored in them. The varnishlog utility allows you to see the log files of Varnish in a better format.
It is very useful to know that Apache still writes the expected log entries in its own log files so you are not missing any traffic data.
The varnishncsa utility displays the log data that is written to varnishncsa.log whereas the varnishtest utility is for testing the Varnish cache. You can find more information about both of them by looking at their man page.
The screen capture image to the left shows the man page of varnishtop, which works like the top command line utility but for Varnish log files. However, the image on the opposite page is much more interesting because it actually shows varnishtop in action.
The varnishadm tool allows you to control a running Varnish process, whereas the varnishhist tool shows a histogram of Varnish requests, which can be useful for checking the performance of Varnish.
As you already know, Varnish comes with many useful utilities than inform us about its operation and performance. This section will talk about the varnishstat utility that displays statistics about a running Varnish process using the various metrics provided by Varnish.
varnishstat can convert its output to XML, using the -x switch, and in JSON format, using the -j switch – you can see the format of both kinds of output in the image below. As structured output is not especially handy for inspecting realtime data, the usefulness of XML and JSON output is that it allows you to store the generated data into a database in order to query it afterwards. You can also get output in plain text format using the -l option. If you use none of these three options, varnishstat will work similarly to the top utility and automatically update its output until you terminate it by pressing the Q key.
Secret varnishing
HTTP/2 is the latest version of the HTTP protocol that is focused on performance. The key differences between HTTP/2 and HTTP/1.1 include binary instead of text transmission of data, security using TLS, multiplexing, which allows HTTP/2 to use a single TCP connection for parallelism, header compression and support for ‘push’ responses. Despite the aforementioned differences and improvements, most of which make HTTP/2 significantly faster than HTTP/1.1, all previous HTTP methods, status codes and semantics remain the same in HTTP/2. You might ask how it would be possible to debug HTTP/2 connections if they are encrypted. The solution is to use NSS keylogging along with the Wireshark plugin – this currently works with Chrome and Firefox web browsers. You can find more information about HTTP/2 at http://httpwg.org/specs/rfc7540.html.
TLS stands for Transport Layer Security and its main purpose is to provide security and privacy between two communicating applications. You can learn all the gory details of the TLS Protocol by looking at its RFC that can be found at https://tools.ietf.org/html/rfc5246.
In order for a program such as Varnish to implement caching, you need two main things: a way to decide whether a resource is cacheable and an efficient way to access and maintain your cache. In other words, there is no point in caching a dynamic page that changes every second and there is no point in having cache data that is too slow to search or needs too much time to maintain.
Varnish the future
Although this tutorial uses Varnish version 4, there is also Varnish 5 available, which offers some important new features. The single most significant feature of Varnish that is currently in an experimental state is support for HTTP/2, which allows you to test HTTP/2 traffic and see how it works with Varnish. However, keep in mind that HTTP/2 support is not enabled by default in Varnish 5. Another new characteristic of Varnish 5 is that it has changed from a feature release schedule to a time release schedule, which means that you can expect a new version of Varnish to appear every six months. As Varnish 5 was released on September 2016, its next version will be released on March 2017. Varnish 5 also supports TLS terminators that terminate TLS connections for you and the PROXY protocol. Should you wish to have support for the previous two features, you can use Hitch TLS ( https://hitch-tls.org).
Varnish 5 also introduces a new director called Shard, which is responsible for load balancing and has support for Negative Cache, which is a pretty advanced feature. There exist many more new features in Varnish 5 but talking about them is beyond the scope of this tutorial.
Keep in mind that you do not have to use every capability of Varnish, mainly because some of these of it are very specialised. The general idea is to use the things you really need before going into the more advanced stuff in order to really understand what you are doing.