The last is­sue of car­ried the col­umn ‘Ex­plor­ing Big Data’, which took a look at Apache Spark. This ar­ti­cle ex­plores HBase, the Hadoop data­base, which is a dis­trib­uted, scal­able big data store. The in­te­gra­tion of Spark with HBase is also cov­ered.

OSFY

OpenSource For You - - GUEST COLUMN -

Spark can work with mul­ti­ple for­mats, in­clud­ing HBase ta­bles. Un­for­tu­nately, I could not get the HBase Python ex­am­ples in­cluded with Spark to work. Hence, you may need to ex­per­i­ment with Scala and Spark in­stead. The quick-start doc­u­men­ta­tion with Scala code is fairly easy to read and un­der­stand even if one knows only Python and not Scala ( http:// spark. apache.org/docs/1.2.1/quick-start.html). You can use the Scala ex­am­ple ‘ HBaseTest.scala’ as the ba­sis for fur­ther ex­plo­ration.

Start the HBase server. In the Spark in­stal­la­tion di­rec­tory, you will need to start the Spark shell in­clud­ing HBase jars in the driver class­path. For ex­am­ple:

Newspapers in English

Newspapers from India

© PressReader. All rights reserved.