如果您對數據庫感興趣,可以添加 DBA解決方案QQ群:895979329
1. 主從復制架構演變介紹
1.1 基本結構
(1)一主一從
(2)一主多從
(3)多級主從
(4)雙主
(5)循環復制
1.2 高級應用架構演變
1.2.1 高性能架構
讀寫分離架構(讀性能較高)
代碼級別
MySQL proxy (Atlas,mysql router,proxySQL(percona),maxscale)、
amoeba(taobao)
xx-dbproxy等。
分布式架構(讀寫性能都提高):
分庫分表——cobar--->TDDL(頭都大了),DRDS
Mycat--->DBLE自主研發等。
NewSQL-->TiDB
1.2.2 高可用架構
(3)單活:MMM架構——mysql-mmm(google)
(4)單活:MHA架構——mysql-master-ha(日本DeNa),T-MHA
(5)多活:MGR ——5.7 新特性 MySQL Group replication(5.7.17) --->Innodb Cluster
(6)多活:MariaDB Galera Cluster架構,(PXC)Percona XtraDB Cluster、MySQL Cluster(Oracle rac)架構
2. 高可用MHA *****
2.1 架構工作原理
主庫宕機處理過程
1. 監控節點 (通過配置文件獲取所有節點信息)
系統,網絡,SSH連接性
主從狀態,重點是主庫
2. 選主
(1) 如果判斷從庫(position或者GTID),數據有差異,最接近于Master的slave,成為備選主
(2) 如果判斷從庫(position或者GTID),數據一致,按照配置文件順序,選主.
(3) 如果設定有權重(candidate_master=1),按照權重強制指定備選主.
1. 默認情況下如果一個slave落后master 100M的relay logs的話,即使有權重,也會失效.
2. 如果check_repl_delay=0的化,即使落后很多日志,也強制選擇其為備選主
3. 數據補償
(1) 當SSH能連接,從庫對比主庫GTID 或者position號,立即將二進制日志保存至各個從節點并且應用(save_binary_logs )
(2) 當SSH不能連接, 對比從庫之間的relaylog的差異(apply_diff_relay_logs)
4. Failover
將備選主進行身份切換,對外提供服務
其余從庫和新主庫確認新的主從關系
5. 應用透明(VIP)
6. 故障切換通知(send_reprt)
7. 二次數據補償(binlog_server)
8. 自愈自治(待開發...)
2.2 架構介紹:
1主2從,master:db01 slave:db02 db03 ):
MHA 高可用方案軟件構成
Manager軟件:選擇一個從節點安裝
Node軟件:所有節點都要安裝
2.3 MHA軟件構成
Manager工具包主要包括以下幾個工具:
masterha_manger 啟動MHA
masterha_check_ssh 檢查MHA的SSH配置狀況
masterha_check_repl 檢查MySQL復制狀況
masterha_master_monitor 檢測master是否宕機
masterha_check_status 檢測當前MHA運行狀態
masterha_master_switch 控制故障轉移(自動或者手動)
masterha_conf_host 添加或刪除配置的server信息
Node工具包主要包括以下幾個工具:
這些工具通常由MHA Manager的腳本觸發,無需人為操作
save_binary_logs 保存和復制master的二進制日志
apply_diff_relay_logs 識別差異的中繼日志事件并將其差異的事件應用于其他的
purge_relay_logs 清除中繼日志(不會阻塞SQL線程)
2.4 MHA環境搭建
2.4.1 規劃:
主庫: 51 node
從庫:
52 node
53 node manager
2.4.2 準備環境(略。1主2從GTID)
2.4.3 配置關鍵程序軟連接
ln -s /data/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
ln -s /data/mysql/bin/mysql /usr/bin/mysql
2.4.4 配置各節點互信
db01:
rm -rf /root/.ssh
ssh-keygen
cd /root/.ssh
mv id_rsa.pub authorized_keys
scp -r /root/.ssh 10.0.0.52:/root
scp -r /root/.ssh 10.0.0.53:/root
各節點驗證
db01:
ssh 10.0.0.51 date
ssh 10.0.0.52 date
ssh 10.0.0.53 date
db02:
ssh 10.0.0.51 date
ssh 10.0.0.52 date
ssh 10.0.0.53 date
db03:
ssh 10.0.0.51 date
ssh 10.0.0.52 date
ssh 10.0.0.53 date
2.4.5 安裝軟件
下載mha軟件
mha官網:https://code.google.com/archive/p/mysql-master-ha/
github下載地址:https://github.com/yoshinorim/mha4mysql-manager/wiki/Downloads
所有節點安裝Node軟件依賴包
yum install perl-DBD-MySQL -y
rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm
在db01主庫中創建mha需要的用戶
grant all privileges on *.* to mha@'10.0.0.%' identified by 'mha';
Manager軟件安裝(db03)
yum install -y perl-Config-Tiny epel-release perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes
rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm
2.4.6 配置文件準備(db03)
創建配置文件目錄
mkdir -p /etc/mha
創建日志目錄
mkdir -p /var/log/mha/app1
編輯mha配置文件
vim /etc/mha/app1.cnf
[server default]
manager_log=/var/log/mha/app1/manager
manager_workdir=/var/log/mha/app1
master_binlog_dir=/data/binlog
user=mha
password=mha
ping_interval=2
repl_password=123
repl_user=repl
ssh_user=root
[server1]
hostname=10.0.0.51
port=3306
[server2]
hostname=10.0.0.52
port=3306
[server3]
hostname=10.0.0.53
port=3306
2.4.7 狀態檢查
### 互信檢查
masterha_check_ssh --conf=/etc/mha/app1.cnf
Fri Apr 19 16:39:34 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Apr 19 16:39:34 2019 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Fri Apr 19 16:39:34 2019 - [info] Reading server configuration from /etc/mha/app1.cnf..
Fri Apr 19 16:39:34 2019 - [info] Starting SSH connection tests..
Fri Apr 19 16:39:35 2019 - [debug]
Fri Apr 19 16:39:34 2019 - [debug] Connecting via SSH from root@10.0.0.51(10.0.0.51:22) to root@10.0.0.52(10.0.0.52:22)..
Fri Apr 19 16:39:34 2019 - [debug] ok.
Fri Apr 19 16:39:34 2019 - [debug] Connecting via SSH from root@10.0.0.51(10.0.0.51:22) to root@10.0.0.53(10.0.0.53:22)..
Fri Apr 19 16:39:35 2019 - [debug] ok.
Fri Apr 19 16:39:36 2019 - [debug]
Fri Apr 19 16:39:35 2019 - [debug] Connecting via SSH from root@10.0.0.52(10.0.0.52:22) to root@10.0.0.51(10.0.0.51:22)..
Fri Apr 19 16:39:35 2019 - [debug] ok.
Fri Apr 19 16:39:35 2019 - [debug] Connecting via SSH from root@10.0.0.52(10.0.0.52:22) to root@10.0.0.53(10.0.0.53:22)..
Fri Apr 19 16:39:35 2019 - [debug] ok.
Fri Apr 19 16:39:37 2019 - [debug]
Fri Apr 19 16:39:35 2019 - [debug] Connecting via SSH from root@10.0.0.53(10.0.0.53:22) to root@10.0.0.51(10.0.0.51:22)..
Fri Apr 19 16:39:35 2019 - [debug] ok.
Fri Apr 19 16:39:35 2019 - [debug] Connecting via SSH from root@10.0.0.53(10.0.0.53:22) to root@10.0.0.52(10.0.0.52:22)..
Fri Apr 19 16:39:36 2019 - [debug] ok.
Fri Apr 19 16:39:37 2019 - [info] All SSH connection tests passed successfully.
主從狀態檢查
[root@db03 ~]# masterha_check_ssh --conf=/etc/mha/app1.cnf
Fri Apr 19 16:39:34 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Apr 19 16:39:34 2019 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Fri Apr 19 16:39:34 2019 - [info] Reading server configuration from /etc/mha/app1.cnf..
Fri Apr 19 16:39:34 2019 - [info] Starting SSH connection tests..
Fri Apr 19 16:39:35 2019 - [debug]
Fri Apr 19 16:39:34 2019 - [debug] Connecting via SSH from root@10.0.0.51(10.0.0.51:22) to root@10.0.0.52(10.0.0.52:22)..
Fri Apr 19 16:39:34 2019 - [debug] ok.
Fri Apr 19 16:39:34 2019 - [debug] Connecting via SSH from root@10.0.0.51(10.0.0.51:22) to root@10.0.0.53(10.0.0.53:22)..
Fri Apr 19 16:39:35 2019 - [debug] ok.
Fri Apr 19 16:39:36 2019 - [debug]
Fri Apr 19 16:39:35 2019 - [debug] Connecting via SSH from root@10.0.0.52(10.0.0.52:22) to root@10.0.0.51(10.0.0.51:22)..
Fri Apr 19 16:39:35 2019 - [debug] ok.
Fri Apr 19 16:39:35 2019 - [debug] Connecting via SSH from root@10.0.0.52(10.0.0.52:22) to root@10.0.0.53(10.0.0.53:22)..
Fri Apr 19 16:39:35 2019 - [debug] ok.
Fri Apr 19 16:39:37 2019 - [debug]
Fri Apr 19 16:39:35 2019 - [debug] Connecting via SSH from root@10.0.0.53(10.0.0.53:22) to root@10.0.0.51(10.0.0.51:22)..
Fri Apr 19 16:39:35 2019 - [debug] ok.
Fri Apr 19 16:39:35 2019 - [debug] Connecting via SSH from root@10.0.0.53(10.0.0.53:22) to root@10.0.0.52(10.0.0.52:22)..
Fri Apr 19 16:39:36 2019 - [debug] ok.
Fri Apr 19 16:39:37 2019 - [info] All SSH connection tests passed successfully.
[root@db03 ~]# masterha_check_repl --conf=/etc/mha/app1.cnf
Fri Apr 19 16:40:50 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Apr 19 16:40:50 2019 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Fri Apr 19 16:40:50 2019 - [info] Reading server configuration from /etc/mha/app1.cnf..
Fri Apr 19 16:40:50 2019 - [info] MHA::MasterMonitor version 0.56.
Fri Apr 19 16:40:51 2019 - [info] GTID failover mode = 1
Fri Apr 19 16:40:51 2019 - [info] Dead Servers:
Fri Apr 19 16:40:51 2019 - [info] Alive Servers:
Fri Apr 19 16:40:51 2019 - [info] 10.0.0.51(10.0.0.51:3306)
Fri Apr 19 16:40:51 2019 - [info] 10.0.0.52(10.0.0.52:3306)
Fri Apr 19 16:40:51 2019 - [info] 10.0.0.53(10.0.0.53:3306)
Fri Apr 19 16:40:51 2019 - [info] Alive Slaves:
Fri Apr 19 16:40:51 2019 - [info] 10.0.0.52(10.0.0.52:3306) Version=5.7.20-log (oldest major version between slaves) log-bin:enabled
Fri Apr 19 16:40:51 2019 - [info] GTID ON
Fri Apr 19 16:40:51 2019 - [info] Replicating from 10.0.0.51(10.0.0.51:3306)
Fri Apr 19 16:40:51 2019 - [info] 10.0.0.53(10.0.0.53:3306) Version=5.7.20-log (oldest major version between slaves) log-bin:enabled
Fri Apr 19 16:40:51 2019 - [info] GTID ON
Fri Apr 19 16:40:51 2019 - [info] Replicating from 10.0.0.51(10.0.0.51:3306)
Fri Apr 19 16:40:51 2019 - [info] Current Alive Master: 10.0.0.51(10.0.0.51:3306)
Fri Apr 19 16:40:51 2019 - [info] Checking slave configurations..
Fri Apr 19 16:40:51 2019 - [info] read_only=1 is not set on slave 10.0.0.52(10.0.0.52:3306).
Fri Apr 19 16:40:51 2019 - [info] read_only=1 is not set on slave 10.0.0.53(10.0.0.53:3306).
Fri Apr 19 16:40:51 2019 - [info] Checking replication filtering settings..
Fri Apr 19 16:40:51 2019 - [info] binlog_do_db= , binlog_ignore_db=
Fri Apr 19 16:40:51 2019 - [info] Replication filtering check ok.
Fri Apr 19 16:40:51 2019 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Apr 19 16:40:51 2019 - [info] Checking SSH publickey authentication settings on the current master..
Fri Apr 19 16:40:51 2019 - [info] HealthCheck: SSH to 10.0.0.51 is reachable.
Fri Apr 19 16:40:51 2019 - [info]
10.0.0.51(10.0.0.51:3306) (current master)
+--10.0.0.52(10.0.0.52:3306)
+--10.0.0.53(10.0.0.53:3306)
Fri Apr 19 16:40:51 2019 - [info] Checking replication health on 10.0.0.52..
Fri Apr 19 16:40:51 2019 - [info] ok.
Fri Apr 19 16:40:51 2019 - [info] Checking replication health on 10.0.0.53..
Fri Apr 19 16:40:51 2019 - [info] ok.
Fri Apr 19 16:40:51 2019 - [warning] master_ip_failover_script is not defined.
Fri Apr 19 16:40:51 2019 - [warning] shutdown_script is not defined.
Fri Apr 19 16:40:51 2019 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
2.4.8 開啟MHA(db03):
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null> /var/log/mha/app1/manager.log 2>&1 &
2.4.9 查看MHA狀態
[root@db03 ~]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:4719) is running(0:PING_OK), master:10.0.0.51
[root@db03 ~]# mysql -umha -pmha -h 10.0.0.51 -e "show variables like 'server_id'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id | 51 |
+---------------+-------+
[root@db03 ~]# mysql -umha -pmha -h 10.0.0.52 -e "show variables like 'server_id'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id | 52 |
+---------------+-------+
[root@db03 ~]# mysql -umha -pmha -h 10.0.0.53 -e "show variables like 'server_id'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id | 53 |
+---------------+-------+
2.4.10 故障模擬及處理
### 停主庫db01:
/etc/init.d/mysqld stop
觀察manager 日志 tail -f /var/log/mha/app1/manager
末尾必須顯示successfully,才算正常切換成功。
修復主庫
[root@db01 ~]# /etc/init.d/mysqld start
恢復主從結構
CHANGE MASTER TO
MASTER_HOST='10.0.0.52',
MASTER_PORT=3306,
MASTER_AUTO_POSITION=1,
MASTER_USER='repl',
MASTER_PASSWORD='123';
start slave ;
修改配置文件
[server1]
hostname=10.0.0.51
port=3306
啟動MHA
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null> /var/log/mha/app1/manager.log 2>&1 &
2.4.11 Manager額外參數介紹
說明:
主庫宕機誰來接管?
1. 所有從節點日志都是一致的,默認會以配置文件的順序去選擇一個新主。
2. 從節點日志不一致,自動選擇最接近于主庫的從庫
3. 如果對于某節點設定了權重(candidate_master=1),權重節點會優先選擇。
但是此節點日志量落后主庫100M日志的話,也不會被選擇。可以配合check_repl_delay=0,關閉日志量的檢查,強制選擇候選節點。
(1) ping_interval=1
#設置監控主庫,發送ping包的時間間隔,嘗試三次沒有回應的時候自動進行failover
(2) candidate_master=1
#設置為候選master,如果設置該參數以后,發生主從切換以后將會將此從庫提升為主庫,即使這個主庫不是集群中事件最新的slave
(3)check_repl_delay=0
#默認情況下如果一個slave落后master 100M的relay logs的話,
MHA將不會選擇該slave作為一個新的master,因為對于這個slave的恢復需要花費很長時間,通過設置check_repl_delay=0,MHA觸發切換在選擇一個新的master的時候將會忽略復制延時,這個參數對于設置了candidate_master=1的主機非常有用,因為這個候選主在切換的過程中一定是新的master
2.4.12 MHA 的vip功能
參數
master_ip_failover_script=/usr/local/bin/master_ip_failover
注意:/usr/local/bin/master_ip_failover,必須事先準備好
修改腳本內容
vi /usr/local/bin/master_ip_failover
my $vip = '10.0.0.55/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
更改manager配置文件:
vi /etc/mha/app1.cnf
添加:
master_ip_failover_script=/usr/local/bin/master_ip_failover
注意:
[root@db03 ~]# dos2unix /usr/local/bin/master_ip_failover
dos2unix: converting file /usr/local/bin/master_ip_failover to Unix format ...
[root@db03 ~]# chmod +x /usr/local/bin/master_ip_failover
主庫上,手工生成第一個vip地址
手工在主庫上綁定vip,注意一定要和配置文件中的ethN一致,我的是eth0:1(1是key指定的值)
ifconfig eth0:1 10.0.0.55/24
重啟mha
masterha_stop --conf=/etc/mha/app1.cnf
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
2.4.13 郵件提醒
1. 參數:
report_script=/usr/local/bin/send
2. 準備郵件腳本
send_report
(1)準備發郵件的腳本(上傳 email_2019-最新.zip中的腳本,到/usr/local/bin/中)
(2)將準備好的腳本添加到mha配置文件中,讓其調用
3. 修改manager配置文件,調用郵件腳本
vi /etc/mha/app1.cnf
report_script=/usr/local/bin/send
(3)停止MHA
masterha_stop --conf=/etc/mha/app1.cnf
(4)開啟MHA
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
(5) 關閉主庫,看警告郵件
故障修復:
1. 恢復故障節點
(1)實例宕掉
/etc/init.d/mysqld start
(2)主機損壞,有可能數據也損壞了
備份并恢復故障節點。
2.恢復主從環境
看日志文件:
CHANGE MASTER TO MASTER_HOST='10.0.0.52', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='123';
start slave ;
3.恢復manager
3.1 修好的故障節點配置信息,加入到配置文件
[server1]
hostname=10.0.0.51
port=3306
3.2 啟動manager
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
2.4.14 binlog server(db03)
參數:
binlogserver配置:
找一臺額外的機器,必須要有5.6以上的版本,支持gtid并開啟,我們直接用的第二個slave(db03)
vim /etc/mha/app1.cnf
[binlog1]
no_master=1
hostname=10.0.0.53
master_binlog_dir=/data/mysql/binlog
創建必要目錄
mkdir -p /data/mysql/binlog
chown -R mysql.mysql /data/*
修改完成后,將主庫binlog拉過來(從000001開始拉,之后的binlog會自動按順序過來)
拉取主庫binlog日志
cd /data/mysql/binlog -----》必須進入到自己創建好的目錄
mysqlbinlog -R --host=10.0.0.52 --user=mha --password=mha --raw --stop-never mysql-bin.000001 &
注意:
拉取日志的起點,需要按照目前從庫的已經獲取到的二進制日志點為起點
重啟MHA
masterha_stop --conf=/etc/mha/app1.cnf
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
故障處理
主庫宕機,binlogserver 自動停掉,manager 也會自動停止。
處理思路:
1、重新獲取新主庫的binlog到binlogserver中
2、重新配置文件binlog server信息
3、最后再啟動MHA
3.管理員在高可用架構維護的職責
1. 搭建:MHA+VIP+SendReport+BinlogServer
2. 監控及故障處理
3. 高可用架構的優化
核心是:盡可能降低主從的延時,讓MHA花在數據補償上的時間盡量減少。
5.7 版本,開啟GTID模式,開啟從庫SQL并發復制。