Processing big data by the petabyte
UK web-hosting company Fasthosts explains how bare metal servers can help devices process big data
In an increasingly interconnected world, where smartphones talk to doorbells and fridges have their own cameras, devices are collecting data like never before. But where is it all stored, and how is it processed?
New smart technology and ‘Internet of Things’ (IOT) devices are constantly being developed. Whether it’s automated thermostats or wearable technology, the collection, processing and analysis of large amounts of data help developers provide IOT services to consumers. This data comes in previously unimaginable quantities, hence the – perhaps understated – “big data”.
For example, a smart thermostat application could be regularly collecting petabytes (nine zeroes) of data on temperature, humidity, time of day, etc. A smartwatch could collect similar amounts of data on things like heart rate, location, and distance travelled. A smart fridge takes photos of its contents and sends them over the internet, so a user can check whether they’ve run out of milk.
Overstretched Resources
These IOT devices are dependent on data – and lots of it. But storing masses of data and performing calculations and analysis on that data puts a serious drain on servers, and so requires vast processing resources.
The most common solution to this problem is Apache Hadoop, which combines Google’s Mapreduce algorithm with its own distributed file system (HDFS). By splitting data into blocks which are distributed over multiple nodes in a cluster of machines, the processing of big data is made much more efficient. Instead of processing and analysing one large block, Hadoop processes multiple, smaller blocks in parallel. This speeds up processing times significantly.
Big-data processing has a particularly unique set of server requirements. Heavy, resource-intensive processing requires the performance capabilities of physical hardware, but choosing that route means you miss out on the scalability and flexibility offered by virtual servers.
With big-data processing the need for resources is short-term, inconsistent, and relatively sporadic. There might be long periods without any need for computational or analytical processing, but as soon as the processing begins the server needs to be able to handle a huge demand on resources. Because of the importance of high-performance resources, the traditional choice for big-data processing has been to run Hadoop on dedicated hardware. But this often results in over-provisioning – and inevitably, overpayment – of resources.
Bare Metal Servers
Bare metal servers offer the perfect middle-ground solution for the big-data processing conundrum. By combining the best bits of both dedicated hardware and virtualised machines, they provide a flexible and powerful option.
Bare metal servers are designed to deal with significant but short-term processing needs. Data can be stored, processed, or analysed on a server for as long as is necessary, and then the server can be spun back down again. This way, resources are not overprovisioned, and there’s no need to continue running the server if it’s no longer being used.
In a cloud-server infrastructure, there could be dozens of virtual machines running on the same physical server, each with its own processing requirements, and each fighting for the same resources. However, bare metal servers are single-tenant and resources are dedicated to only one user. There’s no resource contention or ‘noisy neighbour syndrome’ to worry about, and performance will never be degraded because of the workload of VMS running on the same server – because there are no other VMS running on the server. The resources on each bare metal server is completely dedicated to one user.
Bare metal servers can also be used as part of a network of virtual machines, allowing full flexibility of server architecture. The hypervisor on virtual machines often leads to degraded performance because it is a drain on resources. However, there is no hypervisor layer on bare metal servers so this virtualisation overhead is eliminated with no sacrifice on performance.
A bare metal server is fundamentally a dedicated server, insofar as it offers highperformance resources that are dedicated to one user, but comes with the advantage of flexible, pay-as-you-use billing with no contracts.
The combination of powerful resources and flexible, no-contract billing make bare metal servers the go-to solution for big data processing, where resource demands are temporary and extensive.