The last issue of carried the column ‘Exploring Big Data’, which took a look at Apache Spark. This article explores HBase, the Hadoop database, which is a distributed, scalable big data store. The integration of Spark with HBase is also covered.
Spark can work with multiple formats, including HBase tables. Unfortunately, I could not get the HBase Python examples included with Spark to work. Hence, you may need to experiment with Scala and Spark instead. The quick-start documentation with Scala code is fairly easy to read and understand even if one knows only Python and not Scala ( http:// spark. apache.org/docs/1.2.1/quick-start.html). You can use the Scala example ‘ HBaseTest.scala’ as the basis for further exploration.
Start the HBase server. In the Spark installation directory, you will need to start the Spark shell including HBase jars in the driver classpath. For example: