How to Map Columns With Column Family in Hbase

Apache HBase

Many of yous are familiar with HBase. If yous are not, HBase is a NoSQL database modeled after Google'south BigTable newspaper was published and aims to provide a key-value columnar database on top of HDFS, the Hadoop File System.

HBase lets you insert/query information indexed past a rowkey and organized into columns and families of columns. The rowkey is unique for each data but it can be associated to "unlimited" number of cells, where each cell corresponds to a given column. Columns are then grouped past family and then that columns belonging to the aforementioned family are ever partitioned together.

HBase data model. Source: HBASE SCHEMA DESIGN and Cluster Sizing Notes ApacheCon Europe, November 2012
HBase data model. Source: HBASE SCHEMA Design
and Cluster Sizing Notes
ApacheCon
Europe, November 2012

The column identifier is then specified by <COLUMN_FAMILY>:<COLUMN_NAME>

For instance allow's suppose we have one family called 'w' and the columns are: "referral', 'n_clicks', 'rank'. The respective identifiers are: 'w:referral', 'west:n_clicks', 'w:rank'. Sometimes we desire to hierarchical organize our columns and so that we take multiple sub-columns of the aforementioned parent column, something similar: 'n_clicks of 1st order', 'n_clicks of 2d society' … or 'rank on topwebsites.foo.bar', 'rank on hottestdomains.foo.bar' then on. Ane first thought would be to keep the aforementioned note of the column family/qualifier using the colon ':' grapheme as delimiter: 'w:n_clicks:1', 'w:n_clicks:ii', 'west:rank:tws','w:rank:hard disk drive'… HBase will convert the string representation of the cavalcade qualifier into bytes. Though HBase allows yous to use whatsoever grapheme, we desire to show in this post why we practise not recommend.

HBase is dandy for scalable and quick queries, but what if we want to browse and procedure the entire table? Since our physical data resides in HDFS we can run a Map Reduce chore on top of it. Skilful! Do we actually want to write a row M/R chore? Wouldn't be nice to take a SQL-like linguistic communication that automatically maps our data from HBase into something more than structured?

Apache Hive

Hive is a data warehouse that projects structure onto the data stored in HDFS and provides a SQL-like language called HiveQL for edifice easily query then translated in One thousand/R jobs. Hive tin can externally maps data outside its repository as long as information technology resides in HDFS, this solution is chosen Hive external mapping and it works fine (almost) with information stored in HBase.

[code]
CREATE EXTERNAL TABLE db.hive_table(rowkey String, rank STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ('hbase.columns.mapping' = ":key,w:rank")
TBLPROPERTIES ('hbase.table.name' = 'HBASE_TABLE_NAME');
[/code]

Hive uses SerDe for serializing/deserializing data stored in whatever custom format. Unfortunately in the today version 0.11.0 due to this consequence,  SerDe is not able to map the column qualifiers that contain the colon character.

You cannot map a cavalcade with the following properties:

[code]
CREATE EXTERNAL Table db.hive_table(rowkey STRING, rank STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ('hbase.columns.mapping' = ":key,westward:rank:tws,w:rank:hd") TBLPROPERTIES ('hbase.table.proper noun' = 'HBASE_TABLE_NAME');
[/code]

Otherwise you lot will get an error like:

[lawmaking]
FAILED: Error in metadata: java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Error: the HBase columns mapping contains a badly formed cavalcade family unit, cavalcade qualifier specification.)
FAILED: Execution Error, render code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
[/lawmaking]

Standing to the Java documentation in the source lawmaking :

[code linguistic communication="java"]
* Utility method for parsing a string of the grade '-,b,s,-,s:b,…' as a means of specifying
* whether to use a binary or an UTF string format to serialize and de-serialize primitive
* data types similar boolean, byte, brusque, int, long, float, and double. This applies to
* regular columns and as well to map column types which are associated with an HBase column
* family. For the map types, nosotros apply the specification to the key or the value provided it
* is one of the above archaic types. The specifier is a colon separated value of the class
* -:s, or b:b where we have 's', 'b', or '-' on either side of the colon. 's' is for string
* format storage, 'b' is for native fixed width byte oriented storage, and '-' uses the
* tabular array level default.
*
* @param hbaseTableDefaultStorageType – the specification associated with the tabular array property
* hbase.table.default.storage.blazon
* @throws SerDeException on parse fault.
*
[/code]

Hence if you desire to stick with your schema design, you demand a workaround for the electric current release of Hive to map the underlying HBase tabular array.

Recent Hive versions have this column prefix matching characteristic, that will allow you to map all the columns matching a specified prefix.

[code]CREATE EXTERNAL Tabular array hive_hbase_test
ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED By 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":central,w:rank*")
TBLPROPERTIES ("hbase.table.name" = "TEST_HBASE_TABLE");
[/code]

This volition create a column of type Map<STRING,Cord> in Hive, where the central of the map volition be the total column name (eastward.g. 'rank:tws') and the value is the value of the cell.
If yous are not running the latest version of Hive y'all may miss this feature. Hence, what you want to exercise is mapping the entire column family:

[code]CREATE EXTERNAL Tabular array hive_hbase_test(rowkey STRING, westward MAP<String, String>)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":primal,w:")
TBLPROPERTIES ("hbase.table.proper name" = "TEST_HBASE_TABLE");
[/code]

Now you will have a single cavalcade containing the map of all the fields. If you want to split them out you could create and external view on meridian of this extracting the unmarried columns from this big map.

[code]
CREATE VIEW hive_hbase_test_with_columns
AS
SELECT rowkey,
west['rank:tws'] as rank_tws,
west['rank:hard disk'] as rank_hd,
w['n_click:i'] every bit n_click_1,
westward['n_click:2'] as n_click_23
FROM hive_hbase_test;
[/code]

The drawback is that the view is applied after the base mapping takes place, that means we cannot avert to load all the columns even if we are non interested in all of them.
Hopefully future releases of SerDe volition fix this problem and we volition exist able to but maps column qualifiers that contains any capricious character equally HBase supports.

quezadasumpeormses92.blogspot.com

Source: https://datasciencevademecum.com/2014/05/28/hive-hbase-mapping-with-colon-character/

0 Response to "How to Map Columns With Column Family in Hbase"

Enregistrer un commentaire

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel