hbase優(yōu)化實(shí)踐一

hbase優(yōu)化

一:gc參數(shù)優(yōu)化 :

region服務(wù)器處理過(guò)大的負(fù)載,內(nèi)存分配策略無(wú)法安全地只依賴JRE對(duì)程序的行為的各種假設(shè),需要使用JRE提供的選項(xiàng)調(diào)整垃圾回收策略應(yīng)對(duì)。

寫(xiě)入磁盤(pán)的數(shù)據(jù)客戶端不連續(xù),導(dǎo)致Java虛擬機(jī)堆內(nèi)存出現(xiàn)空洞。

年輕代空間:128~512M之間 老生代:好幾G。

配置文件添加:

hbase-env.sh:

HBASEOPTS或者HBASEREGIONSERVER_OPT(推薦) 推薦配置:

GC參數(shù):

-Xmx50g

-XX:+UseG1GC

-XX:+UnlockExperimentalVMOptions

-XX:MaxGCPauseMillis=100

-XX:InitiatingHeapOccupancyPercent=65

-XX:+ParallelRefProcEnabled

-XX:MaxTenuringThreshold=1

-XX:G1HeapRegionSize=16m

GC日志打印添加參數(shù):

-verbose:gc

-XX:+PrintGC

-XX:+PrintGCDetails

-XX:+PrintGCApplicationStoppedTime

-XX:+PrintHeapAtGC

-XX:+PrintGCDateStamps

-XX:+PrintAdaptiveSizePolicy

-XX:+PrintTenuringDistribution

-XX:+PrintSafepointStatistics

-XX:PrintSafepointStatisticsCount=1

-XX:PrintFLSStatistics=1

-XX:+HeapDumpOnOutOfMemoryError

-XX:HeapDumpPath=/var/log/oom/hbase

-Xloggc:/var/log/hbase-server-gc.log

二:hbase壓縮

可用編碼器:GZIP/LZO/Snappy

Snappy性能稍好,多使用Snappy

hbase啟動(dòng)檢查壓縮:

hbase.regionserver.codecs

snappy,lzo

啟用壓縮:

hbase> create 'test2', { NAME => 'cf2', COMPRESSION => 'SNAPPY' }

hbase> describe 'test'

? ? DESCRIPTION? ENABLED

? ? 'test', {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE? false

? ? ', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0',

? ? VERSIONS => '1', COMPRESSION => 'GZ', MIN_VERSIONS

? ? => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'fa

? ? lse', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}

1 row(s) in 0.1070 seconds

或者:hbase> disable 'test'

hbase> alter 'test', {NAME => 'cf', COMPRESSION => 'GZ'}

hbase> enable 'test'

三:優(yōu)化拆分與合并

3.1管理拆分

hbase可能出現(xiàn)‘拆分/合并風(fēng)暴’

關(guān)閉自動(dòng)管理拆分,啟用手動(dòng)

To disable automatic splitting, set hbase.hregion.max.filesize to a very large value,

such as 100 GB It is not recommended to set it to its absolute maximum value of Long.MAX_VALUE.

3.2 region熱點(diǎn)問(wèn)題

/rowkey的設(shè)計(jì)一:salting前綴設(shè)計(jì)/

byte prefix = (byte) (Long.hashCode(System.currentTimeMillis()) % 8);

byte[] rowkey1 = Bytes.add(Bytes.toBytes(prefix), Bytes.toBytes(System.currentTimeMillis()));

/rowkey的設(shè)計(jì)二:字段交換,提升權(quán)重/

value + System.currentTimeMillis();

/rowkey的設(shè)計(jì)三:隨機(jī)化/

MessageDigest md = MessageDigest.getInstance("MD5");

byte[] rowkey3 = md.digest(Bytes.toBytes(System.currentTimeMillis()));

/rowkey的設(shè)計(jì)四:時(shí)間順序/

long rowkey4 = Long.MAX_VALUE - System.currentTimeMillis();

還可以使用API中move()region移動(dòng)到另一個(gè)regionserver;或者UNassign移除受影響的表的region

3.3預(yù)拆分region

創(chuàng)建表指定需要的region數(shù)目

hbase>create 't1','f',SPLITS => ['10','20',30']

hbase>create 't14','f',SPLITS_FILE=>'splits.txt'

# create table with four regions based on random bytes keys

hbase>create 't2','f1', { NUMREGIONS => 4 , SPLITALGO => 'UniformSplit' }

# create table with five regions based on hex keys

hbase>create 't3','f1', { NUMREGIONS => 5, SPLITALGO => 'HexStringSplit' }

參考:http://hbase.apache.org/book.html#compression

四:負(fù)載均衡:

Use the shell to disable the balancer:

hbase(main):001:0> balance_switch false

true

0 row(s) in 0.3590 seconds

This turns the balancer OFF. To reenable, do:

hbase(main):001:0> balance_switch true

false

0 row(s) in 0.3590 seconds

五:合并region:

某些特出情況下,用戶需要合并region(刪除了大量數(shù)據(jù))

$ bin/hbase org.apache.hadoop.hbase.util.Merge

(If you feel you have too many regions and want to consolidate them, Merge is the utility you need.

Merge must run be done when the cluster is down)

六:客戶端api優(yōu)化:

6.1禁止自動(dòng)刷寫(xiě)

有大量的寫(xiě)入操作

When performing a lot of Puts, make sure that setAutoFlush is set to false on your Table instance.

Otherwise, the Puts will be sent one at a time to the RegionServer.

Puts added via table.add(Put) and table.add( Put) wind up in the same write buffer.

If autoFlush = false, these messages are not sent until the write-buffer is filled.

To explicitly flush the messages, call flushCommits.

Calling close on the Table instance will invoke flushCommits.

6.2使用掃描緩存

比如:hbase作為mapreduce輸入源。

設(shè)置setCaching比默認(rèn)值大多的值。

If HBase is used as an input source for a MapReduce job,

for example, make sure that the input Scan instance to the MapReduce job has setCaching set to something greater than the default (which is 1).

Using the default value means that the map-task will make call back to the region-server for every record processed.

Setting this value to 500, for example, will transfer 500 rows at a time to the client to be processed

6.3限定掃描范圍

6.4關(guān)閉resultScanner

七:配置優(yōu)化;

7.1減少zookeeper超時(shí)

zookeeper.session.timeout

默認(rèn)三分鐘

7.2增加regionserver處理線程

hbase.regionserver.handler.count

默認(rèn)10

7.3增加region大小

管理較少的region可以集群運(yùn)行更平穩(wěn)

默認(rèn)256M

7.4減少最大日志文件數(shù)目

對(duì)于寫(xiě)壓力比較大的應(yīng)用,降低值強(qiáng)迫服務(wù)器頻繁將數(shù)據(jù)寫(xiě)到磁盤(pán),刷寫(xiě)到磁盤(pán)的數(shù)據(jù)的日志就可以丟棄了。

7.5啟用數(shù)據(jù)壓縮

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

推薦閱讀更多精彩內(nèi)容