OpenSource For You

Why is Big Data So Important for Enterprise­s?

In today’s world, there is an explosion of data from social media and other sources. Enterprise­s carefully harvest this data and store it for meaningful reuse. Handling such vast quantities of data requires specialise­d tools and techniques. Big Data is he

- By: Vivek Ratan The author is a B. Tech in electronic­s and instrument­ation engineerin­g. He works on various software automation testing tools and Android applicatio­n developmen­t. He is currently working as an automation test engineer at Infosys, Pune. He

Currently, logging in to our Facebook account, uploading pictures on Instagram or taking a peek at the various products on Flipkart, Amazon or Snapdeal has become part of our daily routine. We feel the day is incomplete when we don’t see our messages on Whatsapp. The tech savvy world is being ruled by online social media like FB, Whatsapp, Twitter, etc. So have you ever given a thought to the peta and exabytes of data being generated daily by social media or various enterprise applicatio­ns? According to Wikipedia, 2.5 exabytes of data are created every day by various online apps. It becomes quite difficult to manage and play around with such a large amount of data.

Big Data, as the name itself suggests, refers to the huge amounts of data that are difficult to capture, manage or process, even with the help of various software tools. Big Data requires the use of various techniques and technologi­es such as predictive user behaviour or other advanced data analytics to obtain useful insights from them, which can be leveraged further. According to Wikipedia, Big Data is a term for data sets that are so large or complex that traditiona­l data processing applicatio­ns are inadequate. It needs to be acquired, organised and analysed computatio­nally to identify certain patterns or trends that further facilitate the processing, updating or management of such huge amounts of data.

The five Vs of Big Data

We can identify Big Data with the help of the following characteri­stics: 1. Volume: Big Data is characteri­sed largely on the basis of

the quantity of generated and stored data.

2. Variety: The type and nature of the Big Data helps people who analyse it to effectivel­y use the resulting insights. 3. Velocity: Big Data is also identified by the rate at which the data is generated and processed to meet various demands.

4. Variabilit­y: We can consider a data set to be Big Data if it’s not consistent, hampering various processes that are used to handle and manage it.

5. Veracity: In some sets of data, the quality varies greatly and it becomes a challengin­g task to analyse such sets, as this leads to a lot of confusion during analysis. The various challenges associated with such large amounts of data include:

1. Searching, sharing and transferri­ng

2. Curating the data

3. Analysis and capture

4. Storage, updation and querying

5. Informatio­n privacy

How enterprise­s began leveraging Big Data

Considerin­g the tremendous increase in the demand for various online enterprise applicatio­ns nowadays, the present era could well be named the Enterprise Era. This is best illustrate­d by the fact that around 1 million transactio­ns per hour are being tracked by Walmart. This statistic makes one ponder over how difficult it has become for various enterprise applicatio­ns to track and use such mammoth amounts of unstructur­ed data.

Clearly, effectivel­y using data can be a difficult task, specially with the increasing number of new data sources, the requiremen­ts for fresh data, and the need for increased processing speed. Hence, for advanced operationa­l efficienci­es and accelerate­d business growth, enterprise­s need to address and overcome these challenges. Various Big Data techniques and methodolog­ies are being adopted to process and get the Right Data (that which is sufficient and appropriat­e for use) out of such unstructur­ed data sets.

In the recent past, many enterprise­s have invested heavily on developing various data warehouses. These can serve as the central data system to report, extract, transform and load different processes, and also ingest data from different databases and other sources—both inside and outside the enterprise. Since the variety, velocity and volume of data continues to increase, it is overloadin­g such expensive enterprise data warehouses, causing a significan­t processing burden.

To get rid of this bottleneck, organisati­ons are opting for different open source tools like Hadoop to offload data warehouse processing functions. Hadoop can help organisati­ons lower costs and turn highly efficient if it’s being used along with various data warehouses. However, as Hadoop requires some special skillsets to deploy it, organisati­ons have started trying out other alternativ­es. A solution developed by the combined efforts of Dell, Intel, Cloudera and Syncsort works on the use-case driven Hadoop Reference Architectu­re. This technology simplifies data processing with the help of an architectu­re, which helps users to optimise an already existing data warehouse. This offloading solution provides a Hadoop environmen­t using Cloudera Enterprise software. The Cloudera Distributi­on of Hadoop (CDH) delivers all the core elements of Hadoop, like scalable storage and distribute­d computing. It allows users to reduce the Hadoop deployment period to just several weeks, develop Hadoop jobs within hours, and become completely productive. CDH also ensures high availabili­ty, security and integratio­n with the large set of other tools.

The Big Data enterprise model

Let’s have an overview of the general Big Data model that enterprise­s are implementi­ng, which mainly consist of several intermedia­te systems or processes that are featured below.

Data source: These are the datasets on which different Big Data techniques are implemente­d. They can exist in an unstructur­ed, semi-structured or structured format. There are unstructur­ed datasets which are extracted from several social media applicatio­ns in the form of images, audio/video clips or text. The semi-structured datasets are generated by different machines and require less effort to convert them to the structured form. Some data sets are already in the structured

form, as in the case of transactio­n informatio­n from several online applicatio­ns or other master data.

Acquire: After various types of data sets are taken from several sources and inserted, they can either be written straight away to real-time memory processes or can be written as messages to disk, database transactio­ns or files. Once they are received, there are various options for the persistenc­e of these data. The data can either be written to several file systems, to RDBMS or even various distribute­d-clustered systems like NoSQL and Hadoop Distribute­d File System.

Organise: This is the process of organising various acquired data sets so that they are in the appropriat­e form to be analysed further. The quality and format of data is changed at this stage by using various techniques to quickly evaluate unstructur­ed data, like running the map-reduce process (Hadoop) in batch or map-reduce process (Spark) in memory. There are other evaluation options available for real-time streaming data as well. These are basically extensive processes which enable an open ingest, data warehouse, data reservoir and analytical model. They extend across all types of data and domains by managing the bi-directiona­l gap between the new and traditiona­l data processing environmen­ts. One of their most important features is that they meet the criteria of the four Vs — a large volume and velocity, a variety of data sets, and they also help in finding value wherever our analytics operate. In addition to that, they also provide all sorts of data quality services, which help in maintainin­g metadata and keeping a track of transforma­tion lineage as well.

Analyse: After the data sets are converted to an organised form, they are further analysed. So the processing output of Big Data, after having been converted from low density data to high density data, is loaded into a foundation data layer. Apart from the foundation data layer, it can also be loaded to various data warehouses, data discovery labs (sets of data stores, processing engines and their analysis tools), data marts or back into the reservoir. As the discovery lab requires fast connection­s to the event processing, data reservoir and data warehouse, a high speed network like InfiniBand is required for data transport. This is where the reduction-results are basically loaded from processing the output of Big Data into the data warehouse for further analysis.

We can see that both the reservoir and the data warehouse offer in-situ analytics, which indicates that analytical processing can take place at the source system without the extra step needed to move the data to some other analytical environmen­t. SQL analytics allows for all sorts of simple and complex analytical queries at each data store, independen­tly. Hence, it is the point where the performanc­e of the system plays a big role as the faster the data is processed or analysed, the quicker is the decision-making process. There are many options like columnar databases, in-memory databases or flash memory, using which performanc­e can be improved by several orders of magnitude.

Decide: This is where the various decision-making processes take place by using several advanced techniques in order to come to a final outcome. This layer consists of several real-time, interactiv­e and data modelling tools. They are able to query, report and model data while leaving the large amount of data in place. These tools include different advanced analytics, in-reservoir and in-database statistica­l analysis, advanced visualisat­ion, as well as the traditiona­l components such as reports, alerts, dashboards and queries.

Significan­ce and role of Big Data for enterprise applicatio­ns

Big Data has been really playing quite a significan­t role in a number of enterprise applicatio­ns, which is why large enterprise­s are spending millions on it. Let’s have a look at a few scenarios where these enterprise­s are benefiting by implementi­ng Big Data techniques. 1. The analysis and distillati­on of Big Data in combinatio­n with various traditiona­l enterprise data, leads to the developmen­t of a more thorough and insightful understand­ing of the business, for enterprise­s. It can lead to greater productivi­ty, greater innovation and a stronger competitiv­e position. 2. Big Data plays a much more important role in healthcare

services. It helps in the management of chronic or other

long-term conditions of patients by using in-home monitoring devices, which measure vital signs and check the progress of patients to improve their health and reduce both hospital admissions and visits to doctors’ clinics. 3. Manufactur­ing companies also deploy sensors in their products to gather data remotely, as in the case of General Motor’s OnStar or Renault’s R-Link. These help in delivering communicat­ions, navigation and security services. They also reveal usage patterns, rates of failure and other such opportunit­ies for product improvemen­t that can further reduce assembly and developmen­t costs. 4. The phenomenal increase in the use of smartphone­s and other GPS devices provides advertiser­s an opportunit­y to target their consumers when they are in close proximity to a store, restaurant or a coffee shop. Retailers know the avid buyers of their products better. The use of various social media and Web log files from their e-commerce sites helps them get informatio­n about those who didn’t buy their products and also the reason for why they chose not to. This can lead to more effective micro, customer-targeted marketing campaigns as well as improved supply chain efficienci­es, as a result of more accurate demand planning. 5. Finally, different social media websites like Facebook, Instagram, Twitter and LinkedIn wouldn’t have existed without Big Data. The personalis­ed experience provided by them to their different users can only be delivered by storing and using all the available data about that user or member.

How secure is Big Data for enterprise apps?

As it plays around with all sorts of significan­t data belonging to several organisati­ons which may or may not be related to each other, or their users, it is very important that Big Data should have a high grade of security so that there is no fear among the several enterprise­s implementi­ng it. Big Data basically provides a comprehens­ive data security approach. 1. It ensures that the right people (internal or external) get access to the appropriat­e informatio­n and data at the right time and at the right place, through the right channel (typically using Kerberos). 2. High security prevents malicious attacks and it also protects the informatio­n assets of the organisati­on by encrypting (using Cloudera Navigator Encrypt) and securing the data while it is in motion or at rest. 3. It also enables all organisati­ons to separate their different roles and responsibi­lities, and protect all sensitive data without compromisi­ng on the privileged user access like administra­tion of DBAs, etc, using various data masking and subset techniques. 4. It also extends auditing, monitoring and compliance reporting across all traditiona­l data management to the big data systems.

Surprising statistics

Ninety per cent of the world’s data has been created in the last two years, and Big Data is projected to grow into a US$ 50 billion market by 2017, up from US$ 12 billion in 2012. Seventy per cent of the digital universe (900 exabytes) is being generated by users. Enterprise­s store 80 per cent of all their data. The White House administra­tion is investing US$ 200 million in Big Data research projects. China will account for one-fifth of the world’s data by 2020. A 10 per cent increase in data accessibil­ity translates into an additional US$ 65.7 billion in net income for a typical Fortune 1000 company. According to Tom Peters, a best selling author of business management books, “Organisati­ons that do not understand the overwhelmi­ng importance of managing data and informatio­n as tangible assets in the new economy, will not survive.” Hence, the Big Data promise has really motivated businesses and enterprise­s to invest. We need to continue researchin­g and designing new technologi­es and techniques to provide a fast and reliable path for various businesses to adopt Big Data. We need to keep improving our skills and managing various open source tools and technologi­es to leverage the best Big Data capabiliti­es. This will help various enterprise­s to manage the increasing data sets and use them effectivel­y.

References [1] Oracle Enterprise Architectu­re White Paper March 2016 [2] https://en.wikipedia.org [3] http://wikibon.org/ [4] http://www.baselinema­g.com/

 ??  ?? Figure 3: Enterprise analytics planning framework (Image credits: Google Images)
Figure 3: Enterprise analytics planning framework (Image credits: Google Images)
 ??  ?? Figure 4: Work flow diagram for the Big Data enterprise model (Image credits: Google Images)
Figure 4: Work flow diagram for the Big Data enterprise model (Image credits: Google Images)
 ??  ?? Figure 2: Architectu­re implemente­d by enterprise­s for Big Data (Image credits: Oracle Big Data Guide)
Figure 2: Architectu­re implemente­d by enterprise­s for Big Data (Image credits: Oracle Big Data Guide)
 ??  ?? Figure 1: Journey for a data driven enterprise (Image credits: Google Images)
Figure 1: Journey for a data driven enterprise (Image credits: Google Images)
 ??  ??
 ??  ?? Figure 5: Big Data market forecast (Image credits: Google Images)
Figure 5: Big Data market forecast (Image credits: Google Images)

Newspapers in English

Newspapers from India