- 請保證所有的集群機器在一個子網內,網絡必須要通, 不然會失敗;
- 統一使用hostname進行配置;請更改每臺機器的hosts文件;
- 報錯
ERROR:
Group Replication join failed.
ERROR: Error joining instance to cluster: 'host-192-168-1-101:3306' - Query failed. MySQL Error (3092): The server is not configureroperly to be an active member of the group. Please see more details on error log.. Query: START group_replication
通過如下方式進行處理
mysql> install plugin group_replication soname 'group_replication.so'; ##安裝插件
mysql> set global group_replication_allow_local_disjoint_gtids_join=ON;
mysql> START GROUP_REPLICATION;
mysql> select * from performance_schema.replication_group_members;
- 報錯
Dba.getCluster: This function is not available through a session to a standalone instance (RuntimeError)
說明集群中的主節點已經不在該機器上,查詢后更改機器重試一下即可;
- 報錯
Dba.getCluster: Dba.getCluster: Unable to get cluster. The instance 'host-192-168-1-101:3306'
may belong to a different ReplicaSet as the one registered in the Metadata since the value of 'group_replication_group_name'
does not match the one registered in the ReplicaSet's Metadata: possible split-brain scenario. Please connect to another member of the ReplicaSet to get the Cluster. (RuntimeError)
最致命的錯誤,master/slave的數據不一致所致,沒辦法,只能重新來
mysql-js>dba.dropMetadataSchema();
6) 請保證集群中的數據庫表都存在主鍵,不然會掛掉;
- 安裝集群監控,保證集群中機器掛掉的時候及時啟動,不然所有節點宕機的時候就是災難到來之時!!! 到時哭都來不及;
- 如何重置Innodb cluster集群環境
主節點:
mysql-js>dba.dropMetadataSchema(); 登錄mysql-shell清空集群
mysql> stop group_replication;
mysql> reset master; (清空日志,確保和從庫的表沒有沖突奧,)
mysql> reset slave;
其他節點(主要清理和主庫的主從信息, 確保主庫和從庫的表沒有沖突奧)
mysql> stop group_replication;
mysql> reset master;
mysql> reset slave
- 主機名和 /etc/hosts中名字不一致
出現報錯:
[Repl] Slave I/O for channel 'group_replication_recovery': error connecting to master 'mysql_innodb_cluster_r0430970923@mysql3:3306' - retry-time: 60 retries: 1, Error_code: MY-002005
[ERROR] [MY-011582] [Repl] Plugin group_replication reported: 'There was an error when connecting to the donor server. Please check that group_replication_recovery channel credentials and all MEMBER_HOST column values of performance_schema.replication_group_members table are correct and DNS resolvable.'
[ERROR] [MY-011583] [Repl] Plugin group_replication reported: 'For details please check performance_schema.replication_connection_status table and error log messages of Slave I/O for channel group_replication_recovery.'
- 主庫的日志應用卡在某個位置無法應用到從庫
出現報錯:
[ERROR] [MY-010586] [Repl] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.000007' position 151
[ERROR] [MY-010584] [Repl] Slave SQL for channel 'group_replication_applier': Error executing row event: 'Unknown database 'mysql_innodb_cluster_metadata'', Error_code: MY-001049
重建master:
mysql> stop group_replication;
mysql> reset master;
- 報錯
[ERROR] Slave SQL for channel 'group_replication_recovery': Could not execute Write_rows event on table mysql_innodb_cluster_metadata.instances;
Cannot add or update a child row: a foreign key constraint fails (mysql_innodb_cluster_metadata.instances, CONSTRAINT instances_ibfk_1 FOREIGN KEY (host_id) REFERENCES hosts (host_id)),
Error_code: 1452; handler error HA_ERR_NO_REFERENCED_ROW; the event's master log binlog.000001, end_log_pos 3059, Error_code: 1452
解決方式:清空表mysql_innodb_cluster_metadata.hosts; 重新建立集群
- 報錯
This member has more executed transactions than those present in the group
解決方式:
mysql> stop group_replication;
mysql> reset master;
- 用戶操作系統資源的限制
[Warning] Buffered warning: Changed limits: max_open_files: 1024 (requested 5000)
[Warning] Buffered warning: Changed limits: table_open_cache: 431 (requested 2000)
解決方式:
# vim /etc/security/limits.conf # 添加下面內容
mysql soft nproc 2047
mysql hard nproc 16384
mysql soft nofile 1024
mysql hard nofile 65535
- 報錯
dba.rebootClusterFromCompleteOutage: The active session instance isn't the most updated in comparison with the ONLINE instances of the
Cluster's metadata.
在集群沒有起來時某些機器的數據表發生變動,導致數據不一致;
解決方式:
所有MySQL機器通過reset master命令清空binlogs
mysql> reset master;
mysql> show master logs;
然后再運行Dba.rebootClusterFromCompleteOutage重啟集群。
- service mysql restart 無法重啟mysql,mysql stuck,并一直輸出日志'[Note] Plugin group_replication reported: '[GCS] cli_err 2''
解決方式:唯一停止MySQL的命令為:
#pkill -9 mysqld
- 如何將Multi-Primary改為Single-Primary?
a) 解散原來的集群:mysql-js> cluster.dissolve({force: true})
b) 每臺主機MySQL修改如下配置:
mysql> set global group_replication_enforce_update_everywhere_checks=OFF;
mysql> set global group_replication_single_primary_mode=ON;
c) 重新創建集群:
mysql-js> var cluster = dba.createCluster('mysqlCluster');
mysql-js> cluster.addInstance('chianyu@svr2:3306');
mysql-js> cluster.addInstance('chianyu@svr3:3306');
- 組復制的限制
- 事物鎖缺失問題:
- 組復制建議,事物隔離級別,read commit
- 序列化隔離級別:多主模式不支持
- 并發DDL和DML: 多主模式下,不支持 一邊對一個表進行DDL,另一邊進行更新,這樣對于DDL在其他實例上操作有未檢出的風險
- 外鍵級聯約束:多主模式下,多級外鍵依賴對引起多級操作, 因此可能導致未知沖突,建議打開 group_replication_enforce_update_everywhere_checks=ON
- 大事物,超過5秒未提交,會導致組通信失敗,
- 多主模式下:select * for update 會導致 死鎖。因為這個鎖并非全組共享。
- 部分復制不支持:組復制下,設置部分復制,會過濾事物,導致組事物不一致。
- Mysql 8.0.11 group_replication_enforce_update_everywhere_checks=ON 多主模式下不支持。
- 停止復制的情況下,某個節點執行命令后再啟動,會因為本地有私有事物,無法加入集群。需要全局 reset master 重新開始集群復制。
- 多實例環境不要用 3306端口
多實例環境下,某個實例采用了默認的3306端口,會導致經常性的誤操作。
一臺主機最多部署10個實例
比如:
cluster節點A服務器啟用三個端口實例: 3310, 3320, 3330,
cluster節點B服務器啟用三個端口實例: 3310, 3320, 3330
cluster節點C服務器啟用三個端口實例: 3310, 3320, 3330
實例數據目錄分別為: /data/mysql3310, /data/mysql3320, /data/mysql3330
管理節點D服務啟動三個端口route端口實例: 3310, 3320, 3330
管理節點E服務啟動三個端口route端口實例: 3310, 3320, 3330
實例數據目錄分別為: /data/router3310, /data/router3320, /data/router3330
- 腦裂場景
當集群中有部分節點出現UNREACHABLE狀態,此時集群無法做出決策,會出現以下局面,此時只剩下一個活躍節點,此節點只能提供查詢,無法寫入,執行寫入操作會hang住。
mysql-js> cl.status();
{
"clusterName": "myCluster",
"defaultReplicaSet": {
"name": "default",
"primary": "node-3:3305",
"status": "NO_QUORUM",
"statusText": "Cluster has no quorum as visible from 'node-1:3305' and cannot process write transactions. 2 members are not active",
"topology": {
"node-1:3305": {
"address": "node-1:3305",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
},
"node-2:3305": {
"address": "node-2:3305",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "(MISSING)"
},
"node-3:3305": {
"address": "node-3:3305",
"mode": "R/W",
"readReplicas": {},
"role": "HA",
"status": "UNREACHABLE"
}
}
}
}
修復這種狀態,需要執行forceQuorumUsingPartitionOf指定當前活躍節點(如果是多個則選擇primary node),此時活躍節點可以提供讀寫操作,然后將其他節點加入此集群。
mysql-js> cluster.forceQuorumUsingPartitionOf('root@node-1:3305')
Restoring replicaset 'default' from loss of quorum, by using the partition composed of [nt-metra-1:3305]
Please provide the password for 'root@nt-metra-1:3305':
Restoring the InnoDB cluster ...
The InnoDB cluster was successfully restored using the partition from the instance 'root@nt-metra-1:3305'.
WARNING: To avoid a split-brain scenario, ensure that all other members of the replicaset are removed or joined back to the group that was restored.
mysql-js> cluster.status();
{
"clusterName": "myCluster",
"defaultReplicaSet": {
"name": "default",
"primary": "nt-metra-1:3305",
"status": "OK_NO_TOLERANCE",
"statusText": "Cluster is NOT tolerant to any failures. 2 members are not active",
"topology": {
"nt-metra-1:3305": {
"address": "nt-metra-1:3305",
"mode": "R/W",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
},
"nt-metra-2:3305": {
"address": "nt-metra-2:3305",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "(MISSING)"
},
"nt-metra-3:3305": {
"address": "nt-metra-3:3305",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "(MISSING)"
}
}
}
}
# 根據情況使用
cluster.rejoinInstance("root@node-2:3305")
#或者,刪除錯誤節點,重新添加:
cluster.removeInstance('root@node-2:3305');
cluster.addInstance('root@node-2:3305');