MHA是什么?
MHA是由日本Mysql專家用Perl寫的一套Mysql故障切換方案,來保障數據庫的高可用性,它的功能是能在0-30s之內實現主Mysql故障轉移(failover),MHA故障轉移可以很好的幫我們解決從庫數據的一致性問題,同時最大化挽回故障發生后數據的一致性。
MHA里有兩個角色一個是node節點?一個是manager節點,要實現這個MHA,必須最少要三臺數據庫服務器,一主多備,即一臺充當master,一臺充當master的備份機,另外一臺是從屬機,這里實驗為了實現更好的效果使用四臺機器,需要說明的是一旦主服務器宕機,備份機即開始充當master提供服務,如果主服務器上線也不會再成為master了,因為如果這樣數據庫的一致性就被改變了。MHA有兩個重要的角色,一個是manager,另外一個是node
192.168.10.94 ? ?manager????管理節點
192.168.10.91 ? ?master?????主庫
192.168.10.92 ? ?slave01????從庫?+?備庫
192.168.10.93 ? ?slave02????從庫
一、環境初始化
1、在master、slave01及slave02上部署mysql(參考文檔《【MySQL5.6.27安裝規范】-【運維部】-張偉科》)
2、修改主機名
主機:?manager執行命令
#?sed?-i?'s/HOSTNAME=.*/HOSTNAME=manager/g'?/etc/sysconfig/network?&&?hostname?manager
主機:?master執行命令
#?sed?-i?'s/HOSTNAME=.*/HOSTNAME=master/g'?/etc/sysconfig/network?&&?hostname?master
主機:?slave01執行命令
#?sed?-i?'s/HOSTNAME=.*/HOSTNAME=slave01/g'?/etc/sysconfig/network?&&?hostname?slave01
主機:?slave02執行命令
#?sed?-i?'s/HOSTNAME=.*/HOSTNAME=slave02/g'?/etc/sysconfig/network?&&?hostname?slave02
2、主機名解析
在manager上執行如下命令
[root@manager?~]#?cat?>>?/etc/hosts?<<?EOF
192.168.10.94????? manager
192.168.10.91????? master
192.168.10.92???? ?slave01
192.168.10.93?? ? slave02
>?EOF
[root@manager?~]#?scp?-o?StrictHostKeyChecking=no?/etc/hosts?root@master:/etc/
[root@manager?~]#?scp?-o?StrictHostKeyChecking=no?/etc/hosts?root@slave01:/etc/
[root@manager?~]#?scp?-o?StrictHostKeyChecking=no?/etc/hosts?root@slave02:/etc/
3、ssh無密碼登錄
主機:?manager執行命令
[root@manager?~]#?ssh-keygen?-t?rsa
[root@manager?~]#?ssh-copy-id?-i?~/.ssh/id_rsa.pub?root@master
[root@manager?~]#?ssh-copy-id?-i?~/.ssh/id_rsa.pub?root@slave01
[root@manager?~]#?ssh-copy-id?-i?~/.ssh/id_rsa.pub?root@slave02
主機:?master執行命令
[root@master?~]#?ssh-keygen?-t?rsa
[root@master?~]#?ssh-copy-id?-i?~/.ssh/id_rsa.pub?root@manager
[root@master?~]#?ssh-copy-id?-i?~/.ssh/id_rsa.pub?root@slave01
[root@master?~]#?ssh-copy-id?-i?~/.ssh/id_rsa.pub?root@slave02
主機:?slave01執行命令
[root@slave01?~]#?ssh-keygen?-t?rsa
[root@slave01?~]#?ssh-copy-id?-i?~/.ssh/id_rsa.pub?root@manager
[root@slave01?~]#?ssh-copy-id?-i?~/.ssh/id_rsa.pub?root@master
[root@slave01?~]#?ssh-copy-id?-i?~/.ssh/id_rsa.pub?root@slave02
主機:?slave02執行命令
[root@slave02?~]#?ssh-keygen?-t?rsa
[root@slave02?~]#?ssh-copy-id?-i?~/.ssh/id_rsa.pub?root@manager
[root@slave02?~]#?ssh-copy-id?-i?~/.ssh/id_rsa.pub?root@master
[root@slave02?~]#?ssh-copy-id?-i?~/.ssh/id_rsa.pub?root@slave01
二、規劃mysql
1、配置master、slave01和slave02之間的主從復制
在MySQL5.6 的Replication配置中,master端同樣要開啟兩個重要的選項,server-id和log-bin,并且選項server-id在全局架構中并且唯一,不能被其它主機使用,這里采用主機ip地址的最后一位充當server-id的值;slave端要開啟relay-log;然后重啟mysql。
主機:?master執行命令
[root@master?~]#?egrep?"log_bin|server_id|relay_log_purge"?/app/mysql/my.cnf
server-id = 91
log-bin = master-bin
relay-log-purge=0
[root@master?~]#??/sbin/iptables -I INPUT -p tcp --dport 3306 -j ACCEPT
[root@master?~]#??/etc/rc.d/init.d/iptables save??
主機:?slave01執行命令
[root@slave01?~]#??egrep?"log_bin|server_id|relay_log_purge"?/app/mysql/my.cnf
server-id = 92
log-bin = master-bin
relay-log-purge=0
[root@master?~]#??/sbin/iptables -I INPUT -p tcp --dport 3306 -j ACCEPT
[root@master?~]#??/etc/rc.d/init.d/iptables save?
主機:?slave02執行命令
[root@slave02?~]#?egrep?"log_bin|server_id|relay_log_purge"?/app/mysql/my.cnf
server-id = 93
log-bin = master-bin
read-only=1
relay-log-purge=0
[root@master?~]#??/sbin/iptables -I INPUT -p tcp --dport 3306 -j ACCEPT
[root@master?~]#??/etc/rc.d/init.d/iptables save?
2、在master、slave01上創建主從同步的賬號。slave01是備用master,這個也需要建立授權用戶
[root@master?~]# ?mysql -e?"grant?all?privileges?on?*.*?to?'rep'@'%'?identified?by?'20151012';flush?privileges"
[root@slave01?~]#?mysql -e?"grant?all?privileges?on?*.*?to?'rep'@'%'?identified?by?'20151012';flush?privileges"
4、在master上執行命令,查看master狀態信息
[root@master?~]#?mysql -e?'show?master?status;'
+-------------------+----------+--------------+------------------+-------------------+
| File? ? ? ? ? ? ? | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+-------------------+----------+--------------+------------------+-------------------+
| master-bin.000001 |? ? ? 120 |? ? ? ? ? ? ? |? ? ? ? ? ? ? ? ? |? ? ? ? ? ? ? ? ? ?|
+-------------------+----------+--------------+------------------+-------------------+
5、在slave01和slave02上執行主從同步
[root@slave01?~]#?mysql
mysql>CHANGE MASTER TO
????MASTER_HOST='192.168.10.91',
????MASTER_USER='rep',
????MASTER_PASSWORD='20151012',
????MASTER_PORT=3306,
????MASTER_LOG_FILE='mysql_bin.000001',
????MASTER_LOG_POS=120;
Query OK, 0 rows affected, 2 warnings (0.02 sec)
mysql>?start?slave;
mysql> show slave status\G
*************************** 1. row ***************************
???????????????Slave_IO_State: Waiting for master to send event
??????????????????Master_Host: 192.168.10.91
??????????????????Master_User: rep
??????????????????Master_Port: 3306
????????????????Connect_Retry: 60
??????????????Master_Log_File: mysql_bin.000001
??????????Read_Master_Log_Pos: 120
???????????????Relay_Log_File: slave01-relay-bin.000002
????????????????Relay_Log_Pos: 283
????????Relay_Master_Log_File: mysql_bin.000001
?????????????Slave_IO_Running: Yes
????????????Slave_SQL_Running: Yes
??????????????Replicate_Do_DB:
??????????Replicate_Ignore_DB:
???????????Replicate_Do_Table:
???????Replicate_Ignore_Table:
??????Replicate_Wild_Do_Table:
??Replicate_Wild_Ignore_Table:
???????????????????Last_Errno: 0
???????????????????Last_Error:
?????????????????Skip_Counter: 0
??????????Exec_Master_Log_Pos: 120
??????????????Relay_Log_Space: 458
??????????????Until_Condition: None
???????????????Until_Log_File:
????????????????Until_Log_Pos: 0
???????????Master_SSL_Allowed: No
???????????Master_SSL_CA_File:
???????????Master_SSL_CA_Path:
??????????????Master_SSL_Cert:
????????????Master_SSL_Cipher:
???????????????Master_SSL_Key:
????????Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
????????????????Last_IO_Errno: 0
????????????????Last_IO_Error:
???????????????Last_SQL_Errno: 0
???????????????Last_SQL_Error:
??Replicate_Ignore_Server_Ids:
?????????????Master_Server_Id: 91
??????????????????Master_UUID: 9b1eb4a5-00f3-11e8-a3ba-ce006127c972
?????????????Master_Info_File: /app/mysql/data/master.info
????????????????????SQL_Delay: 0
??????????SQL_Remaining_Delay: NULL
??????Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
???????????Master_Retry_Count: 86400
??????????????????Master_Bind:
??????Last_IO_Error_Timestamp:
?????Last_SQL_Error_Timestamp:
???????????????Master_SSL_Crl:
???????????Master_SSL_Crlpath:
???????????Retrieved_Gtid_Set:
????????????Executed_Gtid_Set:
????????????????Auto_Position: 0
1 row in set (0.00 sec)
實驗到這里表示主從已經配置完成!接下來我們就開始規劃mha
三、規劃mha
1、創建mha管理用的復制賬號,每臺數據庫上都要創建4個賬號,在這里以其中master為例
[root@master?~]#?mysql
mysql>?grant?all?privileges?on?*.*?to?'mha_rep'@'%'?identified?by?'20151012';?flush?privileges;
mysql> select user,host,password from mysql.user where user='mha_rep';
+---------+------+-------------------------------------------+
| user????| host | password??????????????????????????????????|
+---------+------+-------------------------------------------+
| mha_rep | %????| *F9B93BD42F62D26FD094239C15535EE045F7BB22 |
+---------+------+-------------------------------------------+
2、在3臺主機上(master、slave01和slave02)上分別安裝mha4mysql-node包,這里以master為例,其它主機同理。
準確的來講的話,應該是所有的節點包括manager和node的所有節點都要安裝mha4mysql-node包,只不過等會manager要安裝node節點也要安裝manager節點,所以把manager單獨在下面安裝了。
[root@master?~]#?yum?install?perl-DBD-MySQL?-y
[root@master?~]#?cd /usr/local/src/
[root@master?~]#?wget?https://downloads.mariadb.com/files/MHA/mha4mysql-node-0.54-0.el6.noarch.rpm
[root@master?~]#?rpm?-ivh?mha4mysql-node-0.54-0.el6.noarch.rpm
/usr/bin/apply_diff_relay_logs? ? //識別差異的中繼日志事件并將其差異的事件應用于其他的slave
/usr/bin/filter_mysqlbinlog? ? //去除不必要的ROLLBACK事件(MHA已不再使用這個工具)
/usr/bin/purge_relay_logs? ? //清除中繼日志(不會阻塞SQL線程)
/usr/bin/save_binary_logs//保存和復制master的二進制日志
3、在manager上安裝mha4mysql-manager和mha4mysql-node包
[root@manager?~]#?yum?install?perl?cpan?perl-DBD-MySQL?perl-Config-Tiny?perl-Log-Dispatch?perl-Parallel-ForkManager?perl-Net-Telnet?-y
注釋:由于yum源里沒有這四個安裝包,因此我們需要單獨下載來安裝。
[root@master?~]#?cd /usr/local/src/
[root@manager?~]#?wget?http://rpmfind.net/linux/dag/redhat/el6/en/x86_64/dag/RPMS/perl-Log-Dispatch-2.26-1.el6.rf.noarch.rpm
[root@manager?~]#?wget?ftp://rpmfind.net/linux/dag/redhat/el6/en/x86_64/dag/RPMS/perl-Parallel-ForkManager-0.7.5-2.2.el6.rf.noarch.rpm
[root@manager?~]#?yum?localinstall?*.rpm?-y
安裝manager和node包
[root@manager?~]#cd /usr/local/src/
[root@manager?~]#?wget ?https://downloads.mariadb.com/MHA/mha4mysql-manager-0.55-0.el6.noarch.rpm
[root@manager?~]#?wget ?https://downloads.mariadb.com/MHA/mha4mysql-node-0.54-0.el6.noarch.rpm
[root@manager?~]#?yum?localinstall??mha4mysql-node-0.54-0.el6.noarch.rpm?-y
[root@manager?~]#?yum?localinstall??mha4mysql-manager-0.55-0.el6.noarch.rpm?-y
4、查看mha4mysql-manager安裝了哪些工具
[root@manager?~]#?rpm?-ql?mha4mysql-manager?|grep?bin
/usr/bin/masterha_check_repl? ? //檢查MySQL復制狀況
/usr/bin/masterha_check_ssh? ? //檢查MHA的SSH配置狀況
/usr/bin/masterha_check_status? ? //檢測當前MHA運行狀態
/usr/bin/masterha_conf_host? ? //添加或刪除配置的server信息
/usr/bin/masterha_manager? ? //啟動MHA
/usr/bin/masterha_master_monitor? ? //檢測master是否宕機
/usr/bin/masterha_master_switch? ? //控制故障轉移(自動或者手動)
/usr/bin/masterha_secondary_check
/usr/bin/masterha_stop
5、修改腳本/usr/bin/masterha_secondary_check配置的ssh端口
6、在manager主機上下載mha4mysql-manager的源碼包
#?wget?https://downloads.mariadb.com/MHA/mha4mysql-manager-0.56.tar.gz
7、在manager主機上從mha4mysql-manager的源碼包中提取mha的配置配置文件和腳本
[root@manager?~]#?tar?xf?mha4mysql-manager-0.56.tar.gz?
[root@manager?~]#?mkdir?-p?/app/mha/scripts
[root@manager?~]#?cp?mha4mysql-manager-0.56/samples/scripts/*?/app/mha/scripts/
[root@manager?~]#?cp?mha4mysql-manager-0.56/samples/conf/app1.cnf?/app/mha/mha.cnf
[root@manager?~]#?tree?/app/mha/
/app/mha/
├── mha.cnf
└── scripts
├── master_ip_failover? ? //故障自動切換時對vip管理的腳本,不是必須。如果我們使用keepalived的,我們可以自己編寫腳本完成對vip的管理,比如監控mysql,如果mysql異常,我們停止keepalived就行,這樣vip就會自動漂移
├── master_ip_online_change? ? //在線切換時對vip的管理,不是必須,同樣可以自行編寫簡單的shell完成。
├── power_manager? ? //故障發生后關閉主機的腳本,不是必須
└── send_report? ? //因故障切換后發送報警的腳本,不是必須,可自行編寫簡單的shell完成
8、修改manager端mha的配置文件,如下
[root@manager?~]#?cat?/app/mha/mha.cnf?
[server default]
#監控用戶
user=mha_rep
#監控用戶的密碼
password=20151012
#ssh登錄用戶名
ssh_user=root
#復制用戶名
repl_user=rep
#復制用戶的密碼
repl_password=20151012
#設置監控主庫,發送ping包的時間間隔,默認是3秒,嘗試三次沒有回應的時候自動進行failover
ping_interval=1
#設置manager的工作目錄和日志目錄
manager_workdir=/app/mha?????
manager_log=/app/mha/manager.log
# monitor mysql
#一旦MHA到master的監控之間出現問題,MHA Manager將會嘗試從mysql01,mysql02登錄到master
secondary_check_script= masterha_secondary_check -s 192.168.10.91 -s 192.168.10.92 -s 192.168.10.93
#設置發生切換后發送的報警的腳本
report_script= /app/mha/scripts/send_report
#設置手動切換時候的切換腳本(腳本有瑕疵,需要自行修改)
master_ip_online_change_script= /app/mha/scripts/master_ip_online_change
#設置自動failover時候的切換腳本(腳本有瑕疵,需要自行修改)
master_ip_failover_script=/app/mha/scripts/master_ip_failover
#設置故障發生后關閉故障主機腳本(該腳本的主要作用是關閉主機發生腦裂,這里沒有使用)
#shutdown_script= /app/mha/scripts/power_manager
#check_repl_delay=0
[server1]
hostname=master
ssh_port=60022
candidate_master=1
check_repl_delay=0
master_binlog_dir=/app/mysql/data
[server2]
hostname=slave01
ssh_port=60022
candidate_master=1
check_repl_delay=0
master_binlog_dir=/app/mysql/data
[server3]
hostname=slave02
ssh_port=60022
no_master=1
master_binlog_dir=/app/mysql/data
9、檢查ssh是否暢通
[root@manager?~]#masterha_check_ssh --conf=/app/mha/mha.cnf
Thu Jan 25 11:50:58 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Jan 25 11:50:58 2018 - [info] Reading application default configurations from /app/mha/mha.cnf..
Thu Jan 25 11:50:58 2018 - [info] Reading server configurations from /app/mha/mha.cnf..
Thu Jan 25 11:50:58 2018 - [info] Starting SSH connection tests..
Thu Jan 25 11:51:01 2018 - [debug]
Thu Jan 25 11:50:59 2018 - [debug]??Connecting via SSH from root@slave02(192.168.10.93:60022) to root@master(192.168.10.91:60022)..
Warning: Permanently added '[192.168.10.93]:60022' (ECDSA) to the list of known hosts.
Warning: Permanently added '[192.168.10.91]:60022' (ECDSA) to the list of known hosts.
Thu Jan 25 11:51:00 2018 - [debug]???ok.
Thu Jan 25 11:51:00 2018 - [debug]??Connecting via SSH from root@slave02(192.168.10.93:60022) to root@slave01(192.168.10.92:60022)..
Thu Jan 25 11:51:01 2018 - [debug]???ok.
Thu Jan 25 11:51:01 2018 - [debug]
Thu Jan 25 11:50:59 2018 - [debug]??Connecting via SSH from root@slave01(192.168.10.92:60022) to root@master(192.168.10.91:60022)..
Warning: Permanently added '[192.168.10.91]:60022' (ECDSA) to the list of known hosts.
Thu Jan 25 11:51:00 2018 - [debug]???ok.
Thu Jan 25 11:51:00 2018 - [debug]??Connecting via SSH from root@slave01(192.168.10.92:60022) to root@slave02(192.168.10.93:60022)..
Warning: Permanently added '[192.168.10.93]:60022' (ECDSA) to the list of known hosts.
Thu Jan 25 11:51:00 2018 - [debug]???ok.
Thu Jan 25 11:51:01 2018 - [debug]
Thu Jan 25 11:50:58 2018 - [debug]??Connecting via SSH from root@master(192.168.10.91:60022) to root@slave01(192.168.10.92:60022)..
Warning: Permanently added '[192.168.10.91]:60022' (ECDSA) to the list of known hosts.
Warning: Permanently added '[192.168.10.92]:60022' (ECDSA) to the list of known hosts.
Thu Jan 25 11:51:00 2018 - [debug]???ok.
Thu Jan 25 11:51:00 2018 - [debug]??Connecting via SSH from root@master(192.168.10.91:60022) to root@slave02(192.168.10.93:60022)..
Warning: Permanently added '[192.168.10.93]:60022' (ECDSA) to the list of known hosts.
Thu Jan 25 11:51:00 2018 - [debug]???ok.
Thu Jan 25 11:51:01 2018 - [info] All SSH connection tests passed successfully.
如果得到以上結果,表明主機之間ssh互信是暢通的
10、檢查主從復制是否正常
執行主從復制檢查的時候,這個由于我是用源碼編譯的mysql會出現路徑找不到的問題;比如
(1)?Can't?exec?"mysqlbinlog":?No?such?file?or?directory?at?/usr/local/perl5/MHA/BinlogManager.pm?line?99.
解決辦法:
在master、slave01和slave02上分別執行如下命令
#?ln -s /app/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
(2)mysqlbinlog:?unknown?variable?'default-character-set=utf8'
解決辦法:
在master、slave01和slave02上分別執行注釋client部分的default-character-set=utf8選項,并重啟mysqld服務
(3)Testing?mysql?connection?and?privileges..sh:?mysql:?command?not?found
解決辦法:
在master、slave01和slave02上分別執行如下命令
#?ln?-s?/app/mysql/bin/mysql?/usr/bin/mysql
[root@manager?~]#?masterha_check_repl?--conf=/app/mha/mha.cnf?
Thu Jan 25 12:11:16 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Jan 25 12:11:16 2018 - [info] Reading application default configurations from /app/mha/mha.cnf..
Thu Jan 25 12:11:16 2018 - [info] Reading server configurations from /app/mha/mha.cnf..
Thu Jan 25 12:11:16 2018 - [info] MHA::MasterMonitor version 0.55.
Thu Jan 25 12:11:17 2018 - [info] Dead Servers:
Thu Jan 25 12:11:17 2018 - [info] Alive Servers:
Thu Jan 25 12:11:17 2018 - [info]???master(192.168.10.91:3306)
Thu Jan 25 12:11:17 2018 - [info]???slave01(192.168.10.92:3306)
Thu Jan 25 12:11:17 2018 - [info]???slave02(192.168.10.93:3306)
Thu Jan 25 12:11:17 2018 - [info] Alive Slaves:
Thu Jan 25 12:11:17 2018 - [info]???slave01(192.168.10.92:3306)??Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Thu Jan 25 12:11:17 2018 - [info]?????Replicating from 192.168.10.91(192.168.10.91:3306)
Thu Jan 25 12:11:17 2018 - [info]?????Primary candidate for the new Master (candidate_master is set)
Thu Jan 25 12:11:17 2018 - [info]???slave02(192.168.10.93:3306)??Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Thu Jan 25 12:11:17 2018 - [info]?????Replicating from 192.168.10.91(192.168.10.91:3306)
Thu Jan 25 12:11:17 2018 - [info]?????Not candidate for the new Master (no_master is set)
Thu Jan 25 12:11:17 2018 - [info] Current Alive Master: master(192.168.10.91:3306)
Thu Jan 25 12:11:17 2018 - [info] Checking slave configurations..
Thu Jan 25 12:11:17 2018 - [info]??read_only=1 is not set on slave slave01(192.168.10.92:3306).
Thu Jan 25 12:11:17 2018 - [info] Checking replication filtering settings..
Thu Jan 25 12:11:17 2018 - [info]??binlog_do_db= , binlog_ignore_db=
Thu Jan 25 12:11:17 2018 - [info]??Replication filtering check ok.
Thu Jan 25 12:11:17 2018 - [info] Starting SSH connection tests..
Thu Jan 25 12:11:20 2018 - [info] All SSH connection tests passed successfully.
Thu Jan 25 12:11:20 2018 - [info] Checking MHA Node version..
Thu Jan 25 12:11:20 2018 - [info]??Version check ok.
Thu Jan 25 12:11:20 2018 - [info] Checking SSH publickey authentication settings on the current master..
Thu Jan 25 12:11:21 2018 - [info] HealthCheck: SSH to master is reachable.
Thu Jan 25 12:11:21 2018 - [info] Master MHA Node version is 0.54.
Thu Jan 25 12:11:21 2018 - [info] Checking recovery script configurations on the current master..
Thu Jan 25 12:11:21 2018 - [info]???Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/app/mysql/data --output_file=/var/tmp/save_binary_logs_test --manager_version=0.55 --start_file=mysql_bin.000001
Thu Jan 25 12:11:21 2018 - [info]???Connecting to root@master(master)..
??Creating /var/tmp if not exists..????ok.
??Checking output directory is accessible or not..
???ok.
??Binlog found at /app/mysql/data, up to mysql_bin.000001
Thu Jan 25 12:11:22 2018 - [info] Master setting check done.
Thu Jan 25 12:11:22 2018 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Thu Jan 25 12:11:22 2018 - [info]???Executing command : apply_diff_relay_logs --command=test --slave_user='mha_rep' --slave_host=slave01 --slave_ip=192.168.10.92 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.27-log --manager_version=0.55 --relay_log_info=/app/mysql/data/relay-log.info??--relay_dir=/app/mysql/data/??--slave_pass=xxx
Thu Jan 25 12:11:22 2018 - [info]???Connecting to root@192.168.10.92(slave01:60022)..
??Checking slave recovery environment settings..
????Opening /app/mysql/data/relay-log.info ... ok.
????Relay log found at /app/mysql/data, up to slave01-relay-bin.000002
????Temporary relay log file is /app/mysql/data/slave01-relay-bin.000002
????Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
????Testing mysqlbinlog output.. done.
????Cleaning up test file(s).. done.
Thu Jan 25 12:11:22 2018 - [info]???Executing command : apply_diff_relay_logs --command=test --slave_user='mha_rep' --slave_host=slave02 --slave_ip=192.168.10.93 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.27-log --manager_version=0.55 --relay_log_info=/app/mysql/data/relay-log.info??--relay_dir=/app/mysql/data/??--slave_pass=xxx
Thu Jan 25 12:11:22 2018 - [info]???Connecting to root@192.168.10.93(slave02:60022)..
??Checking slave recovery environment settings..
????Opening /app/mysql/data/relay-log.info ... ok.
????Relay log found at /app/mysql/data, up to slave02-relay-bin.000002
????Temporary relay log file is /app/mysql/data/slave02-relay-bin.000002
????Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
????Testing mysqlbinlog output.. done.
????Cleaning up test file(s).. done.
Thu Jan 25 12:11:22 2018 - [info] Slaves settings check done.
Thu Jan 25 12:11:22 2018 - [info]
master (current master)
+--slave01
+--slave02
Thu Jan 25 12:11:22 2018 - [info] Checking replication health on slave01..
Thu Jan 25 12:11:22 2018 - [info]??ok.
Thu Jan 25 12:11:22 2018 - [info] Checking replication health on slave02..
Thu Jan 25 12:11:22 2018 - [info]??ok.
Thu Jan 25 12:11:22 2018 - [warning] master_ip_failover_script is not defined.
Thu Jan 25 12:11:22 2018 - [warning] shutdown_script is not defined.
Thu Jan 25 12:11:22 2018 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
或者在命令執行后大家會看到有警告信息;比如
Tue?Sep?15?23:45:35?2015?-?[warning]?Global?configuration?file?/etc/masterha_default.cnf?not?found.?Skipping.
在命令執行后輸出結果的第一行有這樣的警告信息,找不到masterha_default.cnf,其實這個文件是mha的全局默認配置文件,由于我們沒有使用全局,所以就跳過了這項,不過不妨礙真個環境。如果大家想用,其實也是可以的,在源碼包里就有這個默認的模板配置文件,大家只需要稍作修改就可以排查這個警告信息
Tue?Sep?15?23:45:38?2015?-?[warning]?master_ip_failover_script?is?not?defined.
Tue?Sep?15?23:45:38?2015?-?[warning]?shutdown_script?is?not?defined.
在命令執行后輸出結果的最后幾行中,提示未定義,大家看看/app/mha/mha.cnf文件中,我們正好注釋了這兩行代碼,其中master_ip_failover_script是后期做vip的時候才用到的。
四、mha實驗模擬
1、在每次做mha實驗的時候,我們都最好先執行如下命令做檢測
[root@manager?~]#?masterha_check_ssh?--conf=/app/mha/mha.cnf
[root@manager?~]#?masterha_check_repl?--conf=/app/mha/mha.cnf
確定兩條命令的返回結果都是無異常的,然后啟動mha服務
2、在manager端啟動mha服務并時刻監控日志文件的輸出變化
[root@manager?~]#?nohup?masterha_manager?--conf=/app/mha/mha.cnf?>?/app/mha/log/mha_manager.log?2>&1?&
[root@manager?~]#?ps?-ef?|grep?masterha?|grep?-v?'grep'
root??????799234??788691??1 14:09 pts/1????00:00:00 perl /bin/masterha_manager --conf=/app/mha/mha.cnf
3、實驗流程第一階段
準備,先來檢查主從是否都均已正常
首先,停止master端的mysqld服務進程,然后查看備庫也就是slave01是否已經提升到主庫
其次,登錄slave02端查看主從是否正常,是否更新到新的master的ip上也就是是否執行slave01的ip地址
最后,啟動master端的mysqld服務進程,并將其加入到主從模式中
準備,實驗開始
在slave01和slave02上執行,檢查主從同步是否都正常,這里以slave01為例,slave02同理
[root@slave01?~]#?mysql -e?'show?slave?status\G'?|egrep?'Slave_IO_Running:|Slave_SQL_Running:'
? ? ? ? ? ? ?Slave_IO_Running: Yes
? ? ? ? ? ? Slave_SQL_Running: Yes
首先,實驗開始
(1)在master端上執行命令來停止mysqld服務進程
[root@master?~]#?/etc/init.d/mysql stop
Shutting?down?MySQL....?SUCCESS!
(2)查看manager端的mha輸出日志,在這里只截取了一部分日志信息
[root@manager?~]#?tail?-f?/app/mha/manager.log
----- Failover Report -----
mha: MySQL Master failover master to slave01 succeeded
Master master is down!
Check MHA Manager logs at manager:/app/mha/manager.log for details.
Started automated(non-interactive) failover.
The latest slave slave01(192.168.10.92:3306) has all relay logs for recovery.
Selected slave01 as a new master.
slave01: OK: Applying all logs succeeded.
slave02: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
slave02: OK: Applying all logs succeeded. Slave started, replicating from slave01.
slave01: Resetting slave info succeeded.
Master failover to slave01(192.168.10.92:3306) completed successfully.
Thu Jan 25 13:52:52 2018 - [info] Sending mail..
Unknown option: conf
其次,實驗開始
登錄slave02查看主從同步是否正常,查看是否已經轉移到新的master的ip上
[root@slave02?~]#?mysql -e?'show?slave?status\G'?|egrep?'Master_Host|Slave_IO_Running:|Slave_SQL_Running:'?
??????????????????Master_Host: 192.168.10.92
?????????????Slave_IO_Running: Yes
????????????Slave_SQL_Running: Yes
最后,實驗開始
(1)在master端啟動mysqld服務
[root@master?~]#?/etc/init.d/mysql start
Starting?MySQL.?SUCCESS!?
(2)在manager端的mha日志文件中找到主從同步的sql語句,這條語句只需要修改密碼即可使用
[root@manager?~]#?grep?'MASTER_HOST'?/app/mha/manager.log?|tail?-n?1
Thu Jan 25 13:52:50 2018 - [info]??All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='slave01 or 192.168.10.92', MASTER_PORT=3306, MASTER_LOG_FILE='mysql_bin.000001', MASTER_LOG_POS=120, MASTER_USER='rep', MASTER_PASSWORD='xxx';
注意:
MASTER_HOST='slave01 or 192.168.10.92'?這個位置需要注意一下,最好只寫一個并建議寫ip地址
(3)在master上啟動主從同步,密碼為20151012
[root@master?~]#?mysql -e?"CHANGE MASTER TO MASTER_HOST='192.168.10.92', MASTER_PORT=3306, MASTER_LOG_FILE='mysql_bin.000001', MASTER_LOG_POS=120, MASTER_USER='rep', MASTER_PASSWORD='20151012';start slave;"
[root@master?~]#?mysql -e?"show?slave?status\G"
*************************** 1. row ***************************
???????????????Slave_IO_State: Waiting for master to send event
??????????????????Master_Host: 192.168.10.92
??????????????????Master_User: rep
??????????????????Master_Port: 3306
????????????????Connect_Retry: 60
??????????????Master_Log_File: mysql_bin.000001
??????????Read_Master_Log_Pos: 120
???????????????Relay_Log_File: master-relay-bin.000002
????????????????Relay_Log_Pos: 283
????????Relay_Master_Log_File: mysql_bin.000001
?????????????Slave_IO_Running: Yes
????????????Slave_SQL_Running: Yes
??????????????Replicate_Do_DB:
??????????Replicate_Ignore_DB:
???????????Replicate_Do_Table:
???????Replicate_Ignore_Table:
??????Replicate_Wild_Do_Table:
??Replicate_Wild_Ignore_Table:
???????????????????Last_Errno: 0
???????????????????Last_Error:
?????????????????Skip_Counter: 0
??????????Exec_Master_Log_Pos: 120
??????????????Relay_Log_Space: 457
??????????????Until_Condition: None
???????????????Until_Log_File:
????????????????Until_Log_Pos: 0
???????????Master_SSL_Allowed: No
???????????Master_SSL_CA_File:
???????????Master_SSL_CA_Path:
??????????????Master_SSL_Cert:
????????????Master_SSL_Cipher:
???????????????Master_SSL_Key:
????????Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
????????????????Last_IO_Errno: 0
????????????????Last_IO_Error:
???????????????Last_SQL_Errno: 0
???????????????Last_SQL_Error:
??Replicate_Ignore_Server_Ids:
?????????????Master_Server_Id: 92
??????????????????Master_UUID: 996b4343-00f3-11e8-a3ba-b6c824ce1080
?????????????Master_Info_File: /app/mysql/data/master.info
????????????????????SQL_Delay: 0
??????????SQL_Remaining_Delay: NULL
??????Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
???????????Master_Retry_Count: 86400
??????????????????Master_Bind:
??????Last_IO_Error_Timestamp:
?????Last_SQL_Error_Timestamp:
???????????????Master_SSL_Crl:
???????????Master_SSL_Crlpath:
???????????Retrieved_Gtid_Set:
????????????Executed_Gtid_Set:
????????????????Auto_Position: 0
4、實驗流程第二階段
準備,先來檢查主從是否都均已正常
首先,停止slave01端的mysqld服務進程,然后查看master是否已經提升到新的主庫
其次,登錄slave02端查看主從是否正常,是否更新到新的master的ip上也就是是否執行master的ip地址
最后,啟動slave01端的mysqld服務進程,并將其加入到主從模式中
這里強調下,默認情況下每次主備庫切換后,mha服務都會停止。
在這里我們需要重新啟動mha服務
[root@manager?~]#?rm?-rf?/app/mha/mha.failover.complete
[root@manager?~]#?nohup?masterha_manager?--conf=/app/mha/mha.cnf?>?/app/mha/log/mha_manager.log?2>&1?&
[1] 3606
[root@manager?~]#?ps?-ef?|grep?masterha?|grep?-v?'grep'
root??????799234??788691??1 14:09 pts/1????00:00:00 perl /bin/masterha_manager --conf=/app/mha/mha.cnf
[root@manager?~]#?masterha_check_status?--conf=/app/mha/mha.cnf
mha (pid:799234) is running(0:PING_OK), master:slave01?#表明現在的master是slave01主機
準備,實驗開始
在master和slave02上執行,檢查主從同步是否都正常,這里以master為例,slave02同理
[root@master?~]#?mysql -e?'show?slave?status\G'?|egrep?'Slave_IO_Running:|Slave_SQL_Running:'?
??????????????????Master_Host: 192.168.10.92
?????????????Slave_IO_Running: Yes
????????????Slave_SQL_Running: Yes
首先,實驗開始
(1)在slave01端上執行命令來停止mysqld服務進程
[root@slave01?~]#?/etc/init.d/mysql stop
Shutting?down?MySQL....?SUCCESS!?
(2)查看manager端的mha輸出日志,在這里只截取了一部分日志信息
[root@manager?~]#?tail?-f?/app/mha/manager.log
----- Failover Report -----
mha: MySQL Master failover slave01 to master succeeded
Master slave01 is down!
Check MHA Manager logs at manager:/app/mha/manager.log for details.
Started automated(non-interactive) failover.
The latest slave master(192.168.10.91:3306) has all relay logs for recovery.
Selected master as a new master.
master: OK: Applying all logs succeeded.
slave02: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
slave02: OK: Applying all logs succeeded. Slave started, replicating from master.
master: Resetting slave info succeeded.
Master failover to master(192.168.10.91:3306) completed successfully.
Thu Jan 25 14:25:48 2018 - [info] Sending mail..
Unknown option: conf
其次,實驗開始
登錄slave02查看主從同步是否正常,查看是否已經轉移到新的master的ip上
[root@slave02?~]#mysql -e 'show slave status\G' |egrep 'Master_Host|Slave_IO_Running:|Slave_SQL_Running:'
??????????????????Master_Host: 192.168.10.91
?????????????Slave_IO_Running: Yes
????????????Slave_SQL_Running: Yes
最后,實驗開始
(1)在slave01端啟動mysqld服務
[root@slave01?~]#?/etc/init.d/mysql start
Starting?MySQL.?SUCCESS!?
(2)在manager端的mha日志文件中找到主從同步的sql語句,這條語句只需要修改密碼即可使用
[root@manager?~]#?grep?'MASTER_HOST'?/app/mha/manager.log?|?tail?-n?1
Thu Jan 25 14:25:46 2018 - [info]??All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='master or 192.168.10.91', MASTER_PORT=3306, MASTER_LOG_FILE='mysql_bin.000002', MASTER_LOG_POS=120, MASTER_USER='rep', MASTER_PASSWORD='xxx';
注意:
MASTER_HOST='master or 192.168.10.91'?這個位置需要注意一下,最好只寫一個并建議寫ip地址
(3)在slave01上啟動主從同步,密碼為20151012
[root@slave01?~]#mysql -e "CHANGE MASTER TO MASTER_HOST='192.168.10.91', MASTER_PORT=3306, MASTER_LOG_FILE='mysql_bin.000002', MASTER_LOG_POS=120, MASTER_USER='rep', MASTER_PASSWORD='20151012'; start slave;"
[root@slave01?~]#?mysql -e?"show?slave?status\G"
*************************** 1. row ***************************
???????????????Slave_IO_State: Waiting for master to send event
??????????????????Master_Host: 192.168.10.91
??????????????????Master_User: rep
??????????????????Master_Port: 3306
????????????????Connect_Retry: 60
??????????????Master_Log_File: mysql_bin.000002
??????????Read_Master_Log_Pos: 120
???????????????Relay_Log_File: slave01-relay-bin.000002
????????????????Relay_Log_Pos: 283
????????Relay_Master_Log_File: mysql_bin.000002
?????????????Slave_IO_Running: Yes
????????????Slave_SQL_Running: Yes
??????????????Replicate_Do_DB:
??????????Replicate_Ignore_DB:
???????????Replicate_Do_Table:
???????Replicate_Ignore_Table:
??????Replicate_Wild_Do_Table:
??Replicate_Wild_Ignore_Table:
???????????????????Last_Errno: 0
???????????????????Last_Error:
?????????????????Skip_Counter: 0
??????????Exec_Master_Log_Pos: 120
??????????????Relay_Log_Space: 458
??????????????Until_Condition: None
???????????????Until_Log_File:
????????????????Until_Log_Pos: 0
???????????Master_SSL_Allowed: No
???????????Master_SSL_CA_File:
???????????Master_SSL_CA_Path:
??????????????Master_SSL_Cert:
????????????Master_SSL_Cipher:
???????????????Master_SSL_Key:
????????Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
????????????????Last_IO_Errno: 0
????????????????Last_IO_Error:
???????????????Last_SQL_Errno: 0
???????????????Last_SQL_Error:
??Replicate_Ignore_Server_Ids:
?????????????Master_Server_Id: 91
??????????????????Master_UUID: 9b1eb4a5-00f3-11e8-a3ba-ce006127c972
?????????????Master_Info_File: /app/mysql/data/master.info
????????????????????SQL_Delay: 0
??????????SQL_Remaining_Delay: NULL
??????Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
???????????Master_Retry_Count: 86400
??????????????????Master_Bind:
??????Last_IO_Error_Timestamp:
?????Last_SQL_Error_Timestamp:
???????????????Master_SSL_Crl:
???????????Master_SSL_Crlpath:
???????????Retrieved_Gtid_Set:
????????????Executed_Gtid_Set:
????????????????Auto_Position: 0
實驗到這一步,已經完成了主庫down后,備用的從庫會自動提升到主庫,并且其它從庫也會重新指向新的master的ip地址。但是這里卻存在一個問題,就是主備庫確實實現了切換,但是對外提供的ip總不能是兩個吧!為了整合keepalived/heartbeat的功能,這里也引入了vip,實現無透明切換
至于如何實現vip的故障轉移,網上也有很多組合,有的是用keepalived實現的故障轉移,也有實現這篇文章中將要提供的腳本檢測功能。
說到這里,實驗過程中,大家會注意執行命令的輸出結果中的警告信息[warning],下面就來說說這個吧
首先,我們看下這個腳本,我們如果想用這個vip的功能,需要打開這個選項
[root@manager?~]#?grep?'^#master_ip_failover_script'?/app/mha/mha.cnf?
master_ip_failover_script=/app/mha/scripts/master_ip_failover
其次,修改里面幾處配置
[root@manager?~]#?mv?/app/mha/scripts/{master_ip_failover,master_ip_failover_bak}
[root@manager?~]#?cat?/app/mha/scripts/master_ip_failover
#!/usr/bin/env?perl
use?strict;
use?warnings?FATAL?=>?'all';
use?Getopt::Long;
my?(
$command,?$ssh_user,?$orig_master_host,?$orig_master_ip,
$orig_master_port,?$new_master_host,?$new_master_ip,?$new_master_port
);
my $vip = '192.168.10.90'; # Virtual IP????#可修改
my $gateway = '192.168.10.254';#Gateway IP????#可修改
my $interface = 'eth0';????????????????????#可修改
my?$key?=?"1";
my?$ssh_start_vip?=?"/sbin/ifconfig?$interface:$key?$vip;/sbin/arping?-I?$interface?-c?3?-s?$vip?$gateway?>/dev/null?2>&1";
my?$ssh_stop_vip?=?"/sbin/ifconfig?$interface:$key?down";
GetOptions(
'command=s'?=>?\$command,
'ssh_user=s'?=>?\$ssh_user,
'orig_master_host=s'?=>?\$orig_master_host,
'orig_master_ip=s'?=>?\$orig_master_ip,
'orig_master_port=i'?=>?\$orig_master_port,
'new_master_host=s'?=>?\$new_master_host,
'new_master_ip=s'?=>?\$new_master_ip,
'new_master_port=i'?=>?\$new_master_port,
);
exit?&main();
sub?main?{
print?"\n\nIN?SCRIPT?TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if?(?$command?eq?"stop"?||?$command?eq?"stopssh"?)?{
#?$orig_master_host,?$orig_master_ip,?$orig_master_port?are?passed.
#?If?you?manage?master?ip?address?at?global?catalog?database,
#?invalidate?orig_master_ip?here.
my?$exit_code?=?1;
eval?{
print?"Disabling?the?VIP?on?old?master:?$orig_master_host?\n";
&stop_vip();
$exit_code?=?0;
};
if?($@)?{
warn?"Got?Error:?$@\n";
exit?$exit_code;
}
exit?$exit_code;
}
elsif?(?$command?eq?"start"?)?{
#?all?arguments?are?passed.
#?If?you?manage?master?ip?address?at?global?catalog?database,
#?activate?new_master_ip?here.
#?You?can?also?grant?write?access?(create?user,?set?read_only=0,?etc)?here.
my?$exit_code?=?10;
eval?{
print?"Enabling?the?VIP?-?$vip?on?the?new?master?-?$new_master_host?\n";
&start_vip();
$exit_code?=?0;
};
if?($@)?{
warn?$@;
exit?$exit_code;
}
exit?$exit_code;
}
elsif?(?$command?eq?"status"?)?{
print?"Checking?the?Status?of?the?script..?OK?\n";
`ssh?$ssh_user\@$orig_master_host?\"?$ssh_start_vip?\"`;
exit?0;
}
else?{
&usage();
exit?1;
}
}
#?A?simple?system?call?that?enable?the?VIP?on?the?new?master
sub?start_vip()?{
`ssh?$ssh_user\@$new_master_host?\"?$ssh_start_vip?\"`;
}
#?A?simple?system?call?that?disable?the?VIP?on?the?old_master
sub?stop_vip()?{
`ssh?$ssh_user\@$orig_master_host?\"?$ssh_stop_vip?\"`;
}
sub?usage?{
"Usage:?master_ip_failover?--command=start|stop|stopssh|status?--orig_master_host=host?--orig_master_ip=ip?--orig_master_port=port?--new_master_host=host?--new_master_ip=ip?--new_master_port=port\n";
}
[root@manager?~]#?chmod?+x?/app/mha/scripts/master_ip_failover
進行ssh檢查
Thu Jan 25 15:56:56 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Jan 25 15:56:56 2018 - [info] Reading application default configurations from /app/mha/mha.cnf..
Thu Jan 25 15:56:56 2018 - [info] Reading server configurations from /app/mha/mha.cnf..
Thu Jan 25 15:56:56 2018 - [info] Starting SSH connection tests..
Thu Jan 25 15:56:58 2018 - [debug]
Thu Jan 25 15:56:56 2018 - [debug]??Connecting via SSH from root@master(192.168.10.91:60022) to root@slave01(192.168.10.92:60022)..
Thu Jan 25 15:56:57 2018 - [debug]???ok.
Thu Jan 25 15:56:57 2018 - [debug]??Connecting via SSH from root@master(192.168.10.91:60022) to root@slave02(192.168.10.93:60022)..
Thu Jan 25 15:56:58 2018 - [debug]???ok.
Thu Jan 25 15:56:59 2018 - [debug]
Thu Jan 25 15:56:57 2018 - [debug]??Connecting via SSH from root@slave02(192.168.10.93:60022) to root@master(192.168.10.91:60022)..
Thu Jan 25 15:56:58 2018 - [debug]???ok.
Thu Jan 25 15:56:58 2018 - [debug]??Connecting via SSH from root@slave02(192.168.10.93:60022) to root@slave01(192.168.10.92:60022)..
Thu Jan 25 15:56:58 2018 - [debug]???ok.
Thu Jan 25 15:56:59 2018 - [debug]
Thu Jan 25 15:56:57 2018 - [debug]??Connecting via SSH from root@slave01(192.168.10.92:60022) to root@master(192.168.10.91:60022)..
Thu Jan 25 15:56:58 2018 - [debug]???ok.
Thu Jan 25 15:56:58 2018 - [debug]??Connecting via SSH from root@slave01(192.168.10.92:60022) to root@slave02(192.168.10.93:60022)..
Thu Jan 25 15:56:58 2018 - [debug]???ok.
Thu Jan 25 15:56:59 2018 - [info] All SSH connection tests passed successfully.
進行主從復制檢查
[root@manager?~]#?masterha_check_repl?--conf=/app/mha/mha.cnf?
Thu Jan 25 15:55:56 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Jan 25 15:55:56 2018 - [info] Reading application default configurations from /app/mha/mha.cnf..
Thu Jan 25 15:55:56 2018 - [info] Reading server configurations from /app/mha/mha.cnf..
Thu Jan 25 15:55:56 2018 - [info] MHA::MasterMonitor version 0.55.
Thu Jan 25 15:55:57 2018 - [info] Dead Servers:
Thu Jan 25 15:55:57 2018 - [info] Alive Servers:
Thu Jan 25 15:55:57 2018 - [info]???master(192.168.10.91:3306)
Thu Jan 25 15:55:57 2018 - [info]???slave01(192.168.10.92:3306)
Thu Jan 25 15:55:57 2018 - [info]???slave02(192.168.10.93:3306)
Thu Jan 25 15:55:57 2018 - [info] Alive Slaves:
Thu Jan 25 15:55:57 2018 - [info]???slave01(192.168.10.92:3306)??Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Thu Jan 25 15:55:57 2018 - [info]?????Replicating from 192.168.10.91(192.168.10.91:3306)
Thu Jan 25 15:55:57 2018 - [info]?????Primary candidate for the new Master (candidate_master is set)
Thu Jan 25 15:55:57 2018 - [info]???slave02(192.168.10.93:3306)??Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Thu Jan 25 15:55:57 2018 - [info]?????Replicating from 192.168.10.91(192.168.10.91:3306)
Thu Jan 25 15:55:57 2018 - [info]?????Not candidate for the new Master (no_master is set)
Thu Jan 25 15:55:57 2018 - [info] Current Alive Master: master(192.168.10.91:3306)
Thu Jan 25 15:55:57 2018 - [info] Checking slave configurations..
Thu Jan 25 15:55:57 2018 - [info]??read_only=1 is not set on slave slave01(192.168.10.92:3306).
Thu Jan 25 15:55:57 2018 - [info] Checking replication filtering settings..
Thu Jan 25 15:55:57 2018 - [info]??binlog_do_db= , binlog_ignore_db=
Thu Jan 25 15:55:57 2018 - [info]??Replication filtering check ok.
Thu Jan 25 15:55:57 2018 - [info] Starting SSH connection tests..
Thu Jan 25 15:55:59 2018 - [info] All SSH connection tests passed successfully.
Thu Jan 25 15:55:59 2018 - [info] Checking MHA Node version..
Thu Jan 25 15:56:00 2018 - [info]??Version check ok.
Thu Jan 25 15:56:00 2018 - [info] Checking SSH publickey authentication settings on the current master..
Thu Jan 25 15:56:00 2018 - [info] HealthCheck: SSH to master is reachable.
Thu Jan 25 15:56:01 2018 - [info] Master MHA Node version is 0.54.
Thu Jan 25 15:56:01 2018 - [info] Checking recovery script configurations on the current master..
Thu Jan 25 15:56:01 2018 - [info]???Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/app/mysql/data --output_file=/var/tmp/save_binary_logs_test --manager_version=0.55 --start_file=mysql_bin.000004
Thu Jan 25 15:56:01 2018 - [info]???Connecting to root@master(master)..
??Creating /var/tmp if not exists..????ok.
??Checking output directory is accessible or not..
???ok.
??Binlog found at /app/mysql/data, up to mysql_bin.000004
Thu Jan 25 15:56:02 2018 - [info] Master setting check done.
Thu Jan 25 15:56:02 2018 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Thu Jan 25 15:56:02 2018 - [info]???Executing command : apply_diff_relay_logs --command=test --slave_user='mha_rep' --slave_host=slave01 --slave_ip=192.168.10.92 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.27-log --manager_version=0.55 --relay_log_info=/app/mysql/data/relay-log.info??--relay_dir=/app/mysql/data/??--slave_pass=xxx
Thu Jan 25 15:56:02 2018 - [info]???Connecting to root@192.168.10.92(slave01:60022)..
??Checking slave recovery environment settings..
????Opening /app/mysql/data/relay-log.info ... ok.
????Relay log found at /app/mysql/data, up to slave01-relay-bin.000006
????Temporary relay log file is /app/mysql/data/slave01-relay-bin.000006
????Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
????Testing mysqlbinlog output.. done.
????Cleaning up test file(s).. done.
Thu Jan 25 15:56:02 2018 - [info]???Executing command : apply_diff_relay_logs --command=test --slave_user='mha_rep' --slave_host=slave02 --slave_ip=192.168.10.93 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.27-log --manager_version=0.55 --relay_log_info=/app/mysql/data/relay-log.info??--relay_dir=/app/mysql/data/??--slave_pass=xxx
Thu Jan 25 15:56:02 2018 - [info]???Connecting to root@192.168.10.93(slave02:60022)..
??Checking slave recovery environment settings..
????Opening /app/mysql/data/relay-log.info ... ok.
????Relay log found at /app/mysql/data, up to slave02-relay-bin.000006
????Temporary relay log file is /app/mysql/data/slave02-relay-bin.000006
????Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
????Testing mysqlbinlog output.. done.
????Cleaning up test file(s).. done.
Thu Jan 25 15:56:02 2018 - [info] Slaves settings check done.
Thu Jan 25 15:56:02 2018 - [info]
master (current master)
+--slave01
+--slave02
Thu Jan 25 15:56:02 2018 - [info] Checking replication health on slave01..
Thu Jan 25 15:56:02 2018 - [info]??ok.
Thu Jan 25 15:56:02 2018 - [info] Checking replication health on slave02..
Thu Jan 25 15:56:02 2018 - [info]??ok.
Thu Jan 25 15:56:02 2018 - [info] Checking master_ip_failover_script status:
Thu Jan 25 15:56:02 2018 - [info]???/app/mha/scripts/master_ip_failover --command=status --ssh_user=root --orig_master_host=master --orig_master_ip=192.168.10.91 --orig_master_port=3306??--orig_master_ssh_port=60022
Unknown option: orig_master_ssh_port
IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 192.168.10.90;/sbin/arping -I eth0 -c 3 -s 192.168.10.90 192.168.10.254 >/dev/null 2>&1===
Checking the Status of the script.. OK
Thu Jan 25 15:56:06 2018 - [info]??OK.
Thu Jan 25 15:56:06 2018 - [warning] shutdown_script is not defined.
Thu Jan 25 15:56:06 2018 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
如果在命令執行后的輸出結果中找不到[warning]?master_ip_failover_script?is?not?defined.表示已經啟動此功能
接下來,我們來啟動mha服務
接下來的流程大致可以這樣來做
準備,啟動mha服務
首先,停止master端的mysqld進程,讓slave01提供到主庫并獲取vip地址
其次,查看其它從庫slave02上主從同步是否正常,是否重新指向新的master的地址
最后,啟動master端的mysqld進程,重新加入到主從模式中
準備,實驗開始
[root@manager?~]#?rm?-rf?/app/mha/mha.failover.complete?
[root@manager?~]#?nohup?masterha_manager?--conf=/app/mha/mha.cnf?>?/app/mha/log/mha_manager.log?2>&1?&
[1] 4066
[root@manager?~]#?ps?-ef?|grep?masterha?|grep?-v?'grep'
root??????805559??805262??0 15:52 pts/2????00:00:00 perl /bin/masterha_check_repl --conf=/app/mha/mha.cnf
root??????806133??805710??0 15:58 pts/3????00:00:00 perl /bin/masterha_manager --conf=/app/mha/mha.cnf
[root@manager?~]#?masterha_check_status?--conf=/app/mha/mha.cnf
ha (pid:806133) is running(0:PING_OK), master:master
首先,實驗開始
(1)在master端上執行命令來停止mysqld服務進程
[root@master?~]#?/etc/init.d/mysql stop
Shutting?down?MySQL....?SUCCESS!?
(2)查看manager端的mha輸出日志,在這里只截取了一部分日志信息
[root@manager?~]#?tail?-f?/app/mha/manager.log
Enabling the VIP - 192.168.10.90 on the new master - slave01
#表示vip的地址是192.168.10.90已經在新的master上開啟,新的master是slave01
-----?Failover?Report?-----
mha: MySQL Master failover master to slave01 succeeded
#表示Master由master轉移到slave01
Master?master?is?down!
#表示master已經down機
Check MHA Manager logs at manager:/app/mha/manager.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on master.
The latest slave slave01(192.168.10.92:3306) has all relay logs for recovery.
Selected slave01 as a new master.
slave01: OK: Applying all logs succeeded.
slave01: OK: Activated master IP address.
slave02: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
slave02: OK: Applying all logs succeeded. Slave started, replicating from slave01.
slave01: Resetting slave info succeeded.
Master failover to slave01(192.168.10.92:3306) completed successfully.
Thu Jan 25 16:00:26 2018 - [info] Sending mail..
Unknown option: conf
(3)登錄slave01查看是否獲取到vip地址
[root@slave01?~]#?ip?addr list
1: lo: mtu 65536 qdisc noqueue state UNKNOWN qlen 1
????link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
????inet 127.0.0.1/8 scope host lo
???????valid_lft forever preferred_lft forever
2: eth0: mtu 1500 qdisc mq state UP qlen 1000
????link/ether b6:c8:24:ce:10:80 brd ff:ff:ff:ff:ff:ff
????inet 192.168.10.92/24 brd 192.168.10.255 scope global eth0
???????valid_lft forever preferred_lft forever
????inet 192.168.10.90/24 brd 192.168.10.255 scope global secondary eth0:1
???????valid_lft forever preferred_lft forever
其次,實驗開始
登錄slave02查看主從同步是否正常,查看是否已經轉移到新的master的ip上
[root@slave02?~]#?mysql -e?'show?slave?status\G'?|egrep?'Master_Host|Slave_IO_Running:|Slave_SQL_Running:'?
??????????????????Master_Host: 192.168.10.92
?????????????Slave_IO_Running: Yes
????????????Slave_SQL_Running: Yes
最后,實驗開始
(1)在master端啟動mysqld服務
[root@master?~]#?/etc/init.d/mysql start
Starting?MySQL.?SUCCESS!?
(2)在manager端的mha日志文件中找到主從同步的sql語句,這條語句只需要修改密碼即可使用
[root@manager?~]#?grep?'MASTER_HOST'?/app/mha/manager.log?|?tail?-n?1
Thu Jan 25 16:00:21 2018 - [info]??All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='slave01 or 192.168.10.92', MASTER_PORT=3306, MASTER_LOG_FILE='mysql_bin.000002', MASTER_LOG_POS=120, MASTER_USER='rep', MASTER_PASSWORD='xxx';
(3)在master上啟動主從同步,密碼為20151012
[root@master?~]#mysql -e "CHANGE MASTER TO MASTER_HOST='192.168.10.92', MASTER_PORT=3306, MASTER_LOG_FILE='mysql_bin.000002', MASTER_LOG_POS=120, MASTER_USER='rep', MASTER_PASSWORD='20151012'; start slave;"
[root@master?~]#?mysql -e?"show?slave?status\G"
*************************** 1. row ***************************
???????????????Slave_IO_State: Waiting for master to send event
??????????????????Master_Host: 192.168.10.92
??????????????????Master_User: rep
??????????????????Master_Port: 3306
????????????????Connect_Retry: 60
??????????????Master_Log_File: mysql_bin.000002
??????????Read_Master_Log_Pos: 120
???????????????Relay_Log_File: c0a80a5b-relay-bin.000002
????????????????Relay_Log_Pos: 283
????????Relay_Master_Log_File: mysql_bin.000002
?????????????Slave_IO_Running: Yes
????????????Slave_SQL_Running: Yes
??????????????Replicate_Do_DB:
??????????Replicate_Ignore_DB:
???????????Replicate_Do_Table:
???????Replicate_Ignore_Table:
??????Replicate_Wild_Do_Table:
??Replicate_Wild_Ignore_Table:
???????????????????Last_Errno: 0
???????????????????Last_Error:
?????????????????Skip_Counter: 0
??????????Exec_Master_Log_Pos: 120
??????????????Relay_Log_Space: 459
??????????????Until_Condition: None
???????????????Until_Log_File:
????????????????Until_Log_Pos: 0
???????????Master_SSL_Allowed: No
???????????Master_SSL_CA_File:
???????????Master_SSL_CA_Path:
??????????????Master_SSL_Cert:
????????????Master_SSL_Cipher:
???????????????Master_SSL_Key:
????????Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
????????????????Last_IO_Errno: 0
????????????????Last_IO_Error:
???????????????Last_SQL_Errno: 0
???????????????Last_SQL_Error:
??Replicate_Ignore_Server_Ids:
?????????????Master_Server_Id: 92
??????????????????Master_UUID: 996b4343-00f3-11e8-a3ba-b6c824ce1080
?????????????Master_Info_File: /app/mysql/data/master.info
????????????????????SQL_Delay: 0
??????????SQL_Remaining_Delay: NULL
??????Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
???????????Master_Retry_Count: 86400
??????????????????Master_Bind:
??????Last_IO_Error_Timestamp:
?????Last_SQL_Error_Timestamp:
???????????????Master_SSL_Crl:
???????????Master_SSL_Crlpath:
???????????Retrieved_Gtid_Set:
????????????Executed_Gtid_Set:
????????????????Auto_Position: 0