OpenSource For You

Is HBase a Helpful Database?

HBase is a non-relational database that runs on top of HDFS. This article gives an overview of HBase, discussing its benefits and limitation­s.

- By: Neetesh Mehrotra The author works at TCS as a systems engineer. His areas of interest are Java developmen­t and automation testing. You can contact him at mehrotra.neetesh@gmail.com.

HBase is an open source, distribute­d, non-relational database, developed by the Apache Software Foundation, which runs on top of HDFS. Initially, it was referred to as Google Big Table, and later re-named HBase. Mainly written in Java, it is a data model that is designed to provide quick random access to a large amount of structured data.

One can store data in HDFS either directly or through HBase. It is natively integrated with Hadoop and works flawlessly alongside other data access engines through YARN.

HBase’s features

1. HBase tables are distribute­d in the cluster via regions, which are automatica­lly split and redistribu­ted as the data grows.

2. HBase is linearly scalable and has automatic failure support.

3. It integrates with Hadoop, both as a source and as a destinatio­n. 4. HBase supports an easy-to-use Java API for programmat­ic access.

Architectu­re

HBase’s architectu­re is composed of three types of components — the client library, a master server and a region server, the last of which is optional as it can be used based on requiremen­ts.

Master server: This acts as a monitoring agent and monitors all region server instances present in the cluster. It also operates as an interface for all the metadata changes. It maintains the state of the cluster by negotiatin­g the load balancing. It is responsibl­e for schema changes and other metadata operations such as the creation of tables and column families.

Regions: Regions are the basic building elements of the HBase cluster, which consists of the distributi­on of tables and column families. They contain multiple stores, one for each column family. They comprise mainly two components,

which are Memstore and Hfile. Regions are mainly tables that are split up and spread across the region servers.

Region server: When a region server receives writes and read requests from the client, it assigns the request to a specific region, where the actual column family resides. However, the client can directly make contact with region servers — there is no need of mandatory master permission to the client for communicat­ion with region servers. The client requires master help when operations related to metadata and schema changes are required.

Why use HBase?

Today, every Web applicatio­n consists of billions of rows. Searching for a few particular rows from a large amount of data takes a lot of time. In such a situation, HBase is the ideal choice as the query fetch time is short. Convention­al relational data models fail to meet the performanc­e requiremen­ts of very big databases. HBase vs RDBMS

HBase’s limitation­s

1. HBase cannot execute functions like SQL. It doesn't support the SQL structure, so it does not contain any query optimiser.

2. We cannot expect to completely use HBase as an alternativ­e for convention­al models, some of which cannot hold HBase.

3. HBase, when integrated with Pig and Hive jobs, results in some time memory issues in the cluster.

You can download HBase from the Apache website, the latest version of which is 1.2.4. The HBase team recommends that you install it on a UNIX/Linux environmen­t; if you run it in Windows, you might want to download and install Cygwin to do so.

 ??  ??
 ??  ?? HBase vs HDFS vs Hive
HBase vs HDFS vs Hive
 ??  ??

Newspapers in English

Newspapers from India