Tuesday, June 13, 2017

NodeJS with HBase via HBase Thrift2 Part 1: Connect

Motivation


In each their own, both NodeJS and HBase are power full tools. NodeJS for spining efficient apis up fast, and HBase for holding large amount of data(in somewhat it's own way). More important HBase also solves the small file issue on Hadoop. So combining them can make sense. But it is fairly not documented.

HBase comes with a REST API and a Thrift API. Where the Thrift API is the most efficient, despite that the REST API is returning instantiated javascript objects (hence JSON). The reason is, Thrift is utilizing binary transmission and which more compact than JSON which is utilized by the REST API. There is an older github page with some benchmarking: https://github.com/stelcheck/node-hbase-vs-thrift 

The at-the-time-writing, the latest stable version of HBase is version 1.2.6, it has 2 Thrift interfaces, called: HBase Thrift and HBase Thrift2. HBase Thrift is more general/administrative purpose, where tables can be created, deleted and data manipulated. The stuff I prefer to do in the HBase Shell, and not from a service. HBase Thrift2 is data only, CRUD and even batch operations which are not found in HBase Thrift. 

HBase Part

To make this post complete, we'll go from table creation in HBase, to a connection to it, from NodeJS.


Table creation in HBase from the HBase shell

 create_namespace 'foo'  
 create 'foo:bar', 'family1'  

Start HBase Thrift2 API from OS shell

 bin/hbase-daemon.sh start thrift2  

NB! By default HBase Thrift and HBase Thrift2 are setup to use port 9095 and 9090. I you want them to run concurrent, it is possible set custom port numbers for the APIs

NB! HBase Thrift API can crash due to lack of heap memory, the heap memory can be increased in the config file: conf/hbase-env.sh
 # The maximum amount of heap to use. Default is left to JVM default.  
 export HBASE_HEAPSIZE=8G  

Good to go

NodeJS part

Pre-requisites, besides from having NodeJS installed, is the Thrift compiler and the HBase Thrift definition file. A Thrift definition file acts as a documentation file and a definition file for building service/client proxies.

Thrift compiler can be found on Apache's Thrift homepage: https://thrift.apache.org/ 
HBase Thrift definition file can be found in the HBase source package from the HBase homepage: https://hbase.apache.org/

Start the NodeJS project and add the Thrift package

 mkdir node_hbase  
 cd node_hbase  
 npm init  
 npm install thrift  

Create the proxy client package from the HBase Thrift definition file

 thrift-0.10.0.exe --gen js:node hbase-1.2.6-src\hbase-1.2.6\hbase-thrift\src\main\resources\org\apache\hadoop\hbase\thrift2\Hbase.thrift  

Create the index.js file (you can call what ever you want)

 var thrift = require('thrift');  
 var HBaseService = require('./gen-nodejs/THBaseService.js');  
 var HBaseTypes = require('./gen-nodejs/HBase_types.js');  
 var connection = thrift.createConnection('IP or DNS to your HBase server', 9090); 
 
 connection.on('connect', function () {  
   var client = thrift.createClient(HBaseService, connection);  
   client.getAllRegionLocations('foo:bar', function (err, data) {  
     if (err) {  
       console.log('error:', err);  
     } else {  
       console.log('All region locations for table:' + JSON.stringify(data));  
     }  
     connection.end();  
   });  
 });
  
 connection.on('error', function (err) {  
   console.log('error:', err);  
 });  

Run the js script and get some result

 node index.js  
 All region locations for table:[{"serverName":{"hostName":"localhost","port":49048,"startCode":{"buffer":{"type":"Buffer","data":[0,0,1,92,160,234,132,254]},"offset":0}},"regionInfo":{"regionId":{"buffer":{"type":"Buffer","data":[0,0,0,0,0,0,0,0]},"offset":0},"tableName":{"type":"Buffer","data":[102,111,111,58,98,97,114]},"startKey":{"type":"Buffer","data":[]},"endKey":{"type":"Buffer","data":[]},"offline":false,"split":false,"replicaId":0}}]