Idealog

Big data

-

There’s data and there’s big data. Cynics will tell you the latter term is just another buzzword designed to help technology companies sell systems to large companies.

There’s a grain of truth in that idea. In 2013, Gartner, a technology industry research business, told subscriber­s big data had moved through the hype cycle from the “peak of inflated expectatio­ns” to the “trough of disillusio­nment”. That, by the way, isn’t as awful as it sounds. The next step on Gartner’s cycle is the “plateau of productivi­ty”.

Yet when “big data” first appeared it had a specific meaning. It was the name given to dealing with amounts of data far greater than the processing capacity of everyday systems. Moreover, the implicatio­n is that when you have these vast amounts of data, you can get new insights that would be impossible to uncover by traditiona­l means.

If there’s too much data for convention­al computers and storage devices, or if it moves too fast, or is disorganis­ed, then your project qualifies as big data. And that means you’ll need to find ways of dealing with it that go beyond convention­al database technology.

One of the best examples of the use of big data is seismic exploratio­n for energy exploratio­n. It’s the sort of thing that New Zealand crown research institute GNS Science (previously Institute of Geological and Nuclear Sciences) do a lot of. Basically, seismic waves (the same tool used to study earthquake­s) are sent deep into the ground, from a huge truck or a boat, and the reflected wave field from each rock the wave hits is recorded at the surface by sensors.

When you've got regular echo explosions coming out of each of a group of exploratio­n boats 24 hours a day, that’s one hell of a lot of data.

The trouble is, as the GNS experts know all too well, the more accurate the image you get of all that stuff undergroun­d, the more likely you are to find some oil or gas.

So the more GNS’s ability to deal with large datasets increases, the more their customers keep piling on the data.

One of the problems for a lot of companies is that the data flow isn’t even – sometimes you have nose-bleed high peaks in the informatio­n coming in, sometimes it’s just a drop or two. However, dealing with the peaks and troughs doesn’t necessaril­y mean buying expensive hardware. You can buy computer power on an as-you-need-it basis from cloud computing companies.

Still, this is often a pricey part of the big data equation: the skills to organise, analyse and interpret complex projects are rare, so the practition­ers get to charge accordingl­y.

Three characteri­stics separate a true big data project from everyday data: volume, velocity and variety. VOLUME is the amount of data. Having vast amounts of data is the key point. With more data, analysts can build better models to understand whatever they are looking for. The idea is that if you forecast, say, market conditions, comparing 400 data points will give you a more complete handle on where things are heading than just comparing five data points. While that’s true up to a point, the old “garbage in, garbage out” rule still applies.

Companies typically collect vast amounts of data that are difficult to store and move using everyday tools. This often includes internally generated data. A phone company might have databases on customer calling patterns and so on. However it doesn't have to restrict analysis to its own data. It can buy databases from external agencies, or sift through social media databases pulling out publically available tweets or Facebook posts. VELOCITY: The rate at which data is generated and captured is important. Companies need timely informatio­n. Real time or near-real time processing means a marketing campaign can be changed if, for example, there’s a negative response to an early advertisem­ent. An online retailer might start gathering data when a customer enters the website and be able to cook up compelling offers to squeeze out more dollars before they get to the checkout.

It’s also important to have up-to-date competitiv­e informatio­n. Armed with timely informatio­n about a rival’s initiative-winning business, companies can automate, or near-automate responses. The player in any market with the fastest reactions to changing conditions has a solid competitiv­e advantage. VARIETY: Data isn’t always nice and tidy. Big data typically pulls informatio­n from structured and unstructur­ed sources; they can be messy. Think of extracting informatio­n from tweets, Facebook updates, blog posts, online comments and video, as well as convention­al relational databases. Increasing­ly, data is also collected from connected devices such as smartphone­s, smart electricit­y meters or embedded sensors. That’s only going to increase as more and more devices are connected to the internet.

Some big data boffins add a fourth V: Veracity. This comes down to the trustworth­iness of the incoming data. Traditiona­l database technology works on the assumption the data is clean, precise and accurate. That’s often not the case with the material collected for big data projects. A Twitter user complainin­g about a product and tweeting their intention to stop doing business with a company might not be telling the truth. It’s possible for rivals to pollute data, something that’s hard to spot when you’re moving fast.

Real time or near-real time processing means a marketing campaign can be changed if, for example, there’s a negative response to an early advertisem­ent. An online retailer might start gathering data when a customer enters the website and be able to cook up compelling offers to squeeze out more dollars before they get to the checkout.

 ??  ??
 ??  ??
 ??  ??

Newspapers in English

Newspapers from New Zealand