一 背景
某一創(chuàng)業(yè)的朋友的主機(jī)因?yàn)榇疟P陣列損壞機(jī)器crash,重啟MySQL服務(wù)時(shí) 報(bào)如下錯(cuò)誤:
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Doing recovery: scanned up to log sequence number 9120034833
150125 16:12:51 InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents: 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 150125 16:12:51 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 5.5.37-MariaDB-log
key_buffer_size=268435456
read_buffer_size=1048576
max_used_connections=0
max_threads=1002
thread_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2332093 K bytes of memory
41 Hope that.
二 分析
主要關(guān)注 mysqld got signal 11 的問(wèn)題,從日志內(nèi)容分析來(lái)看,數(shù)據(jù)庫(kù)在機(jī)器crash 導(dǎo)致日志文件損壞,重啟之后無(wú)法正?;謴?fù),更無(wú)法正常對(duì)外提供服務(wù)。
三 解決
因?yàn)槿罩疽呀?jīng)損壞,這里采用非常規(guī)手段,首先修改innodb_force_recovery參數(shù),使mysqld跳過(guò)恢復(fù)步驟,將mysqld 啟動(dòng),將數(shù)據(jù)導(dǎo)出來(lái)然后重建數(shù)據(jù)庫(kù)。
innodb_force_recovery可以設(shè)置為1-6,大的數(shù)字包含前面所有數(shù)字的影響。
- (SRV_FORCE_IGNORE_CORRUPT):忽略檢查到的corrupt頁(yè)。
- (SRV_FORCE_NO_BACKGROUND):阻止主線程的運(yùn)行,如主線程需要執(zhí)行full purge操作,會(huì)導(dǎo)致crash。
- (SRV_FORCE_NO_TRX_UNDO):不執(zhí)行事務(wù)回滾操作。
- (SRV_FORCE_NO_IBUF_MERGE):不執(zhí)行插入緩沖的合并操作。
- (SRV_FORCE_NO_UNDO_LOG_SCAN):不查看重做日志,InnoDB存儲(chǔ)引擎會(huì)將未提交的事務(wù)視為已提交。
- (SRV_FORCE_NO_LOG_REDO):不執(zhí)行前滾的操作。
注意
a 當(dāng)設(shè)置參數(shù)值大于0后,可以對(duì)表進(jìn)行select,create,drop操作,但insert,update或者delete這類操作是不允許的。
b 當(dāng)innodb_purge_threads 和 innodb_force_recovery一起設(shè)置會(huì)出現(xiàn)一種loop現(xiàn)象:
150125 17:07:42 InnoDB: Waiting for the background threads to start
150125 17:07:43 InnoDB: Waiting for the background threads to start
150125 17:07:44 InnoDB: Waiting for the background threads to start
150125 17:07:45 InnoDB: Waiting for the background threads to start
150125 17:07:46 InnoDB: Waiting for the background threads to start
150125 17:07:47 InnoDB: Waiting for the background threads to start
在my.cnf中修改以下兩個(gè)參數(shù)
innodb_force_recovery=6
innodb_purge_thread=0
**重啟MySQL **
150125 17:10:47 [Note] Crash recovery finished.
150125 17:10:47 [Note] Server socket created on IP: '0.0.0.0'.
150125 17:10:47 [Note] Event Scheduler: Loaded 0 events
150125 17:10:47 [Note] /vdata/webserver/mysql/bin/mysqld: ready for connections.
Version: '5.5.37-MariaDB-log' socket: '/tmp/mysql.sock' port: 3306 Source distribution
立即對(duì)數(shù)據(jù)庫(kù)做邏輯導(dǎo)出 ,完成之后將innodb_force_recovery設(shè)置為0 ,innodb_purge_thread=1 ,然后重建數(shù)據(jù)庫(kù) 。另外 MySQL 版本 5.5以及之前 ,當(dāng)innodb_purge_threads =1,innodb_force_recovery >1 的情況會(huì)出現(xiàn)上文提到的循環(huán)報(bào)warning 問(wèn)題(=1 沒(méi)有問(wèn)題)
原因:
MySQL 的源代碼中顯示 當(dāng)innodb_purge_threads 和 innodb_force_recovery一起設(shè)置會(huì)出現(xiàn)loop循環(huán)
while (srv_shutdown_state == SRV_SHUTDOWN_NONE) {
if (srv_thread_has_reserved_slot(SRV_MASTER) == ULINT_UNDEFINED
|| (srv_n_purge_threads == 1
&& srv_thread_has_reserved_slot(SRV_WORKER)
== ULINT_UNDEFINED)) {
ut_print_timestamp(stderr);
fprintf(stderr, " InnoDB: Waiting for the background threads to start\n");
os_thread_sleep(1000000);
} else {
break;
}
}
所以當(dāng)需要設(shè)置innodb_force_recovery>1的時(shí)候需要關(guān)閉 innodb_purge_threads,設(shè)置為0(默認(rèn))。
四 小結(jié)
MySQL crash 或者 MySQL 數(shù)據(jù)庫(kù)服務(wù)器 crash 會(huì)導(dǎo)致各種各樣的問(wèn)題 ,比如主備之間的error 1594 (5.6 版本開(kāi)啟crash-safe ,會(huì)最大程度上避免 error 1594的問(wèn)題,以后會(huì)寫5.6新特性介紹該功能 ),error 1236, 日志損壞,數(shù)據(jù)文件損壞 ,等等,本案例只是其中的一種,細(xì)心從日志中找的相關(guān)錯(cuò)誤提示,逐步解決即可。