OpenSource For You

The last issue of carried the column ‘Exploring Big Data’, which took a look at Apache Spark. This article explores HBase, the Hadoop database, which is a distribute­d, scalable big data store. The integratio­n of Spark with HBase is also covered.

OSFY

-

Spark can work with multiple formats, including HBase tables. Unfortunat­ely, I could not get the HBase Python examples included with Spark to work. Hence, you may need to experiment with Scala and Spark instead. The quick-start documentat­ion with Scala code is fairly easy to read and understand even if one knows only Python and not Scala ( http:// spark. apache.org/docs/1.2.1/quick-start.html). You can use the Scala example ‘ HBaseTest.scala’ as the basis for further exploratio­n.

Start the HBase server. In the Spark installati­on directory, you will need to start the Spark shell including HBase jars in the driver classpath. For example:

Newspapers in English

Newspapers from India