HBase是運行在Hadoop集群上的一個數(shù)據(jù)庫,與傳統(tǒng)的數(shù)據(jù)庫有嚴(yán)格的ACID(原子性、一致性、隔離性、持久性)要求不一樣,HBase降低了這些要求從而獲得更好的擴(kuò)展性,它更適合存儲一些非結(jié)構(gòu)化和半結(jié)構(gòu)化的數(shù)據(jù)。
Apache HBase is a database that runs on a Hadoop cluster. HBase is not a traditional RDBMS, as it relaxes the ACID (Atomicity, Consistency, Isolation, and Durability) properties of traditional RDBMS systems in order to achieve much greater scalability. Data stored in HBase also does not need to fit into a rigid schema like with an RDBMS, making it ideal for storing unstructured or semi-structured data.
傳統(tǒng)數(shù)據(jù)庫vs.HBase
首先要理解一個概念,什么是NoSql,可以去下面的地址先了解一下:http://www.runoob.com/mongodb/nosql.html。
HBase也是基于NoSql的思想,那么為什么需要他們呢?先來看看傳統(tǒng)數(shù)據(jù)庫:
Why do we need NoSQL/HBase? First, let’s look at the pros of relational databases before we discuss its limitations:
- Relational databases have provided a standard persistence model
- SQL has become a de-facto standard model of data manipulation (SQL)
- Relational databases manage concurrency for transactions
- Relational database have lots of tools
傳統(tǒng)數(shù)據(jù)庫已經(jīng)存在很長時間了,那么為什么又要造出來一個NoSql或者HBase呢?那是因為數(shù)據(jù)量不夠大,舉個例子,如果在關(guān)系數(shù)據(jù)庫中存了幾百條記錄,那么全表掃描查詢的時間還ok,但一旦涉及到幾千萬條或者上億的記錄,那這個時間就不敢恭維了,隨著數(shù)據(jù)量的增大,我們需要擴(kuò)展數(shù)據(jù)庫,其中一種方法當(dāng)然是換個更好的服務(wù)器,但是缺點也很明顯, 貴啊!而且隨著服務(wù)器容量的增大可能又會有一些其他的限制。
Relational databases were the standard for years, so what changed? With more and more data came the need to scale. One way to scale is vertically with a bigger server, but this can get expensive, and there are limits as your size increases.
NoSql跟傳統(tǒng)數(shù)據(jù)庫的區(qū)別
另外一種擴(kuò)展的方式是橫向擴(kuò)展機(jī)器,這個相比換更好的服務(wù)器方法不需要花太多的人民幣,你只要再購買一些相同的機(jī)器就好了。這樣原先在一臺機(jī)器上存儲的數(shù)據(jù)就要分布式地存儲到每臺機(jī)器上,而這些數(shù)據(jù)是根據(jù)行為單位進(jìn)行劃分的。比如10000行數(shù)據(jù),1-2500行存儲到A機(jī)器上,2500-5000行數(shù)據(jù)存儲到B機(jī)器上,以此類推。但是你是不能對傳統(tǒng)數(shù)據(jù)庫干這種事的。不過要注意的是,因為使用了這樣的存儲方式,也失去了其他一些關(guān)系數(shù)據(jù)庫的操作,因為關(guān)系數(shù)據(jù)庫是運行在一個節(jié)點上,而不是運行在集群中,所以兩者的操作必然會有其針對的目標(biāo)。
An alternative to vertical scaling is to scale horizontally with a cluster of machines, which can use commodity hardware. This can be cheaper and more reliable. To horizontally partition or shard a RDBMS, data is distributed on the basis of rows, with some rows residing on a single machine and the other rows residing on other machines, However, it’s complicated to partition or shard a relational database, and it was not designed to do this automatically. In addition, you lose the querying, transactions, and consistency controls across shards. Relational databases were designed for a single node; they were not designed to be run on clusters.
傳統(tǒng)數(shù)據(jù)模型的局限性
數(shù)據(jù)庫的課程中我們說過關(guān)系數(shù)據(jù)庫的規(guī)范化,它通過關(guān)系模式的拆分,可以解決數(shù)據(jù)庫中的數(shù)據(jù)冗余問題,但同樣隨著關(guān)系模式的標(biāo)準(zhǔn)化程度越高,拆分的關(guān)系模式越多,也就意味著在查詢的過程中需要連接很多個表,這當(dāng)然會造成性能問題。而hbase沒有這樣的標(biāo)準(zhǔn)化過程,它將可能會同時訪問的數(shù)據(jù)存儲在一起,所以它避免了傳統(tǒng)數(shù)據(jù)庫在查詢時的性能問題。兩者存儲的區(qū)別如下圖所示:
Database normalization eliminates redundant data, which makes storage efficient. However, a normalized schema causes joins for queries, in order to bring the data back together again. While HBase does not support relationships and joins, data that is accessed together is stored together so it avoids the limitations associated with a relational model. See the difference in data storage models in the chart below:
前面提到,可能同時訪問的數(shù)據(jù)會存儲在一起,HBase正是基于這個理念進(jìn)行擴(kuò)展的。它使用了Key-Value的方式,將數(shù)據(jù)按照key進(jìn)行分組,再通過這個key決定存儲到哪個的節(jié)點上去。這樣每個節(jié)點上都有數(shù)據(jù),并且是原始數(shù)據(jù)的一部分。HBase實際上是谷歌的BigTable的一個實現(xiàn)。
HBase was designed to scale due to the fact that data that is accessed together is stored together. Grouping the data by key is central to running on a cluster. In horizontal partitioning or sharding, the key range is used for sharding, which distributes different data across multiple servers. Each server is the source for a subset of data. Distributed data is accessed together, which makes it faster for scaling. HBase is actually an implementation of the BigTable storage architecture, which is a distributed storage system developed by Google that’s used to manage structured data that is designed to scale to a very large size.
HBase的特點:分布式、可擴(kuò)展、快速
Hbase是基于列族的!他的每一行數(shù)據(jù)都有一個key,整個數(shù)據(jù)庫是按照key進(jìn)行索引的,你可以用一個key來查詢到數(shù)據(jù)庫的某一行數(shù)據(jù)。這個列族到底是個什么東西?其實就是列的集合,比如地址(Adress)這個列族,他可以是省份(Province)、城市(City)、街道(Street)這些列的集合,這樣,一行可能有若干個列族,其實就是將關(guān)系數(shù)據(jù)庫中的有關(guān)聯(lián)的字段結(jié)合在一起了,所以每一行可以看成是這些列族的結(jié)合。
HBase is referred to as a column family-oriented data store. It’s also row-oriented: each row is indexed by a key that you can use for lookup (for example, lookup a customer with the ID of 1234). Each column family groups like data (customer address, order) within rows. Think of a row as the join of all values in all column families.
HBase是一個分布式的數(shù)據(jù)庫,通過key將數(shù)據(jù)進(jìn)行分組,這些key是更新操作時的基本單位。
HBase is also considered a distributed database. Grouping the data by key is central to running on a cluster and sharding. The key acts as the atomic unit for updates. Sharding distributes different data across multiple servers, and each server is the source for a subset of data.
HBase數(shù)據(jù)模型
說了這么多抽象的東西,終于可以看看HBase的到底是怎么存儲數(shù)據(jù)的了。之前說HBase每一行都有一個key,我們管他叫RowKey,它的作用有點像關(guān)系數(shù)據(jù)庫中的主鍵,根據(jù)這個RowKey就能找到這一行的數(shù)據(jù)。每行是按照RowKey進(jìn)行排序的,這個是HBase數(shù)據(jù)存儲的一個基礎(chǔ)模型,在HBase中是嚴(yán)格遵守的。
Data stored in HBase is located by its “rowkey.” This is like a primary key from a relational database. Records in HBase are stored in sorted order, according to rowkey. This is a fundamental tenet of HBase and is also a critical semantic used in HBase schema design.
既然是分布式存儲,那么必然要考慮怎么講一張完整的表切分成若干塊分別存儲。我們已經(jīng)知道了表中的數(shù)據(jù)是按照key進(jìn)行排序的,那么切分就按行的順序往下切就好了,比如10個一組地切。這樣我們就把原始數(shù)據(jù)切成了好多塊,每塊叫一個Region,每個Region被分配到一個節(jié)點上存儲。
Tables are divided into sequences of rows, by key range, called regions. These regions are then assigned to the data nodes in the cluster called “RegionServers.” This scales read and write capacity by spreading regions across the cluster. This is done automatically and is how HBase was designed for horizontal sharding.
每個Region中是若干行的數(shù)據(jù),而每行又是若干個列族的結(jié)合,每個列族存儲時對應(yīng)于一個存儲文件(HFile)
The image below shows how column families are mapped to storage files. Column families are stored in separate files, which can be accessed separately.
列族和RowKey已經(jīng)說過了,還剩下的就是具體存儲的數(shù)據(jù)了,數(shù)據(jù)當(dāng)然是存儲在表的每個格子中。考慮在這個模型中,想要獲得一個格子中的數(shù)據(jù)應(yīng)該怎么定位?首先是RowKey定位到一行,然后通過column family定位到這行的某個列族,一個列族中又有若干列,所以在通過列名定位到這一列,看起來定位一個數(shù)據(jù)只需要這些就夠了。實際上還需要一個timestamp。
timestamp是更新數(shù)據(jù)的時候引入的,HBase在更新數(shù)據(jù)時不會簡單的覆蓋原始數(shù)據(jù),而是在保留原始數(shù)據(jù)的同時存儲新數(shù)據(jù),那么就需要引入一個版本一樣的屬性來區(qū)別兩個數(shù)據(jù),timestamp時間戳做為這個屬性在合適不過。
這些定位的數(shù)據(jù)加上那個格子里存儲的數(shù)據(jù)叫一個KeyValue結(jié)構(gòu)。key就是rowkey+column family+column name+timestamp,用于定位數(shù)據(jù),value就是存儲的那個數(shù)據(jù)。
The data is stored in HBase table cells. The entire cell, with the added structural information, is called Key Value. The entire cell, the row key, column family name, column name, timestamp, and value are stored for every cell for which you have set a value. The key consists of the row key, column family name, column name, and timestamp.
Logically, cells are stored in a table format, but physically, rows are stored as linear sets of cells containing all the key value information inside them.
In the image below, the top left shows the logical layout of the data, while the lower right section shows the physical storage in files. Column families are stored in separate files. The entire cell, the row key, column family name, column name, timestamp, and value are stored for every cell for which you have set a value.
HBase在存儲的過程中可能會遇到很多格子沒有數(shù)據(jù)的情況,對于這種情況,HBase不會存儲空數(shù)據(jù),這種特性使得hbase在應(yīng)對稀疏表的時候也不會浪費存儲空間。對于表格中的每個數(shù)據(jù),都對應(yīng)著一個版本,默認(rèn)會采用timestamp作為版本,當(dāng)然這個版本也可以自定義。所以對于row+family+column定位的格子,里面可能有若干個版本的數(shù)據(jù)。
?As mentioned before, the complete coordinates to a cell's value are: Table:Row:Family:Column:Timestamp ? Value. HBase tables are sparsely populated. If data doesn’t exist at a column, it’s not stored. Table cells are versioned uninterpreted arrays of bytes. You can use the timestamp or set up your own versioning system. For every coordinate row : family : column, there can be multiple versions of the value.
有了版本的概念后,我們會發(fā)現(xiàn),一個put操作可以看成一個insert和一個update兩個,一方面并沒有直接更改已有的數(shù)據(jù),而是創(chuàng)建了一個新的數(shù)據(jù),另一方面由于這個新數(shù)據(jù)的版本號比較新,因此同時也是更新的操作。
如果是刪除一個數(shù)據(jù),那么會給刪除的數(shù)據(jù)加上一個墓碑的標(biāo)記,有了這個墓碑的標(biāo)記的數(shù)據(jù)在查詢的時候就不會返回了。
在查詢數(shù)據(jù)時,如果指定了相應(yīng)的參數(shù),會返回特定版本的數(shù)據(jù),如果沒有指定版本的參數(shù)的話,默認(rèn)會返回最新版本的數(shù)據(jù)。
每個列族中的數(shù)據(jù)可以存儲多少個版本是可以通過配置進(jìn)行設(shè)置的,默認(rèn)會保存三個版本的數(shù)據(jù),如果超過了這個值,那么最老的版本就會被淘汰,始終維持最新的三個版本。
Versioning is built in. A put is both an insert (create) and an update, and each one gets its own version. Delete gets a tombstone marker. The tombstone marker prevents the data being returned in queries. Get requests return specific version(s) based on parameters. If you do not specify any parameters, the most recent version is returned. You can configure how many versions you want to keep and this is done per column family. The default is to keep up to three versions. When the max number of versions is exceeded, extra records will be eventually removed.
總結(jié)
引用網(wǎng)上一篇博客的內(nèi)容作為總結(jié)(http://www.cnblogs.com/tgzhu/p/5857035.html):
Table: 與傳統(tǒng)關(guān)系型數(shù)據(jù)庫類似,HBase以表(Table)的方式組織數(shù)據(jù),應(yīng)用程序?qū)?shù)據(jù)存入HBase表中。
Row: HBase表中的行通過 RowKey 進(jìn)行唯一標(biāo)識,不論是數(shù)字還是字符串,最終都會轉(zhuǎn)換成字段數(shù)據(jù)進(jìn)行存儲;HBase表中的行是按RowKey字典順序排列。
Column Family: HBase表由行和列共同組織,同時引入列族的概念,它將一列或多列組織在一起,HBase的列必須屬于某一個列族,在創(chuàng)建表時只需指定表名和至少一個列族。
Cell: 行和列的交叉點稱為單元格,單元格的內(nèi)容就是列的值,以二進(jìn)制形式存儲,同時它是版本化
Version: 每個cell的值可保存數(shù)據(jù)的多個版本(到底支持幾個版本可在建表時指定),按時間順序倒序排列,時間戳是64位的整數(shù),可在寫入數(shù)據(jù)時賦值,也可由RegionServer自動賦值。
注意:
- HBase沒有數(shù)據(jù)類型,任何列值都被轉(zhuǎn)換成字符串進(jìn)行存儲。
- 與關(guān)系型數(shù)據(jù)庫在創(chuàng)建表時需明確包含的列及類型不同,HBase表的每一行可以有不同的列。
- 相同RowKey的插入操作被認(rèn)為是同一行的操作。即相同RowKey的二次寫入操作,第二次可被可為是對該行某些列的更新操作
- 列由列族和列名連接而成, 分隔符是冒號,如 d:Name (d: 列族名, Name: 列名)
以一個示例來說明關(guān)系型數(shù)據(jù)表和HBase表各自的解決方案(示例:博文及作者),關(guān)系型數(shù)據(jù)庫表結(jié)構(gòu)設(shè)計及數(shù)據(jù)如下圖:
文章的作者是一個外鍵,指向作者表中的PK,下面是兩個表的示例數(shù)據(jù):
如果用HBase來設(shè)計的話,就會變成這個樣子:
小結(jié):
HBase不支持條件查詢和Order by等查詢,讀取記錄只能按Row key(及其range)或全表掃描
在表創(chuàng)建時只需聲明表名和至少一個列族名,每個Column Family為一個存儲單元。
在上例中設(shè)計了一個HBase表blog,該表有兩個列族:article和author,但在實際應(yīng)用中強烈建議使用單列族。
Column不用創(chuàng)建表時定義即可以動態(tài)新增,同一Column Family的Columns會群聚在一個存儲單元上,并依Column key排序,因此設(shè)計時應(yīng)將具有相同I/O特性的Column設(shè)計在一個Column Family上以提高性能。注意:這個列是可以增加和刪除的,這和我們的傳統(tǒng)數(shù)據(jù)庫很大的區(qū)別。所以他適合非結(jié)構(gòu)化數(shù)據(jù)。
HBase通過row和column確定一份數(shù)據(jù),這份數(shù)據(jù)的值可能有多個版本,不同版本的值按照時間倒序排序,即最新的數(shù)據(jù)排在最前面,查詢時默認(rèn)返回最新版本。如上例中row key=1的author:nickname值有兩個版本,分別為1317180070811對應(yīng)的“一葉渡江”和1317180718830對應(yīng)的“yedu”(對應(yīng)到實際業(yè)務(wù)可以理解為在某時刻修改了nickname為yedu,但舊值仍然存在)。Timestamp默認(rèn)為系統(tǒng)當(dāng)前時間(精確到毫秒),也可以在寫入數(shù)據(jù)時指定該值。
每個單元格值通過4個鍵唯一索引,tableName+RowKey+ColumnKey+Timestamp--->value, 例如上例中{tableName=’blog’,RowKey=’1’,ColumnName=’author:nickname’,Timestamp=’ 1317180718830’}索引到的唯一值是“yedu”。
存儲類型
- TableName 是字符串
- RowKey 和 ColumnName 是二進(jìn)制值(Java 類型 byte[])
- Timestamp 是一個 64 位整數(shù)(Java 類型 long)
- value 是一個字節(jié)數(shù)組(Java類型 byte[])