線上一個5.7從庫復制中斷:
mysql>show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.0.1
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000006
Read_Master_Log_Pos: 49461560
Relay_Log_File: db-music1-relay-bin.000002
Relay_Log_Pos: 379466837
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1399
Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 3 failed executing transaction 'd199707a-5287-11e7-8f35-246e96729f10:1215544883' at master log mysql-bin.000001, end_log_pos 123. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Skip_Counter: 0
Exec_Master_Log_Pos: 379466624
Relay_Log_Space: 2575974028
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1399
Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 3 failed executing transaction 'd199707a-5287-11e7-8f35-246e96729f10:1215544883' at master log mysql-bin.000001, end_log_pos 123. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Replicate_Ignore_Server_Ids:
Master_Server_Id: 244331
Master_UUID: ce522aa3-6398-11e7-945f-246e96729568
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: 170708 13:22:50
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: ce522aa3-6398-11e7-945f-246e96729568:1,
d199707a-5287-11e7-8f35-246e96729f10:1215032565-1218510979
Executed_Gtid_Set: 30472597-527e-11e7-a93f-246e96729ec0:1-47,
d199707a-5287-11e7-8f35-246e96729f10:1-1215544882,
d23b9cfd-5287-11e7-a307-246e96729ff0:1-9
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
查詢具體報錯:
mysql>select * from performance_schema.replication_applier_status_by_worker \G
*************************** 1. row ***************************
CHANNEL_NAME:
WORKER_ID: 3
THREAD_ID: NULL
SERVICE_STATE: OFF
LAST_SEEN_TRANSACTION: d199707a-5287-11e7-8f35-246e96729f10:1215544883
LAST_ERROR_NUMBER: 1399
LAST_ERROR_MESSAGE: Worker 3 failed executing transaction 'd199707a-5287-11e7-8f35-246e96729f10:1215544883' at master log mysql-bin.000001, end_log_pos 123; Error 'XAER_RMFAIL: The command cannot be executed when global transaction is in the IDLE state' on query. Default database: ''. Query: 'ROLLBACK'
LAST_ERROR_TIMESTAMP: 2017-07-08 13:22:50
1 rows in set (0.00 sec)
第一感覺很奇怪,為什么會rollback失敗呢?于是根據(jù)gtid去對應的主庫binlog去看了下,并沒有任何rollback語句:
# at 379611076
#170707 20:28:38 server id 61481 end_log_pos 379611141 CRC32 0x65240fde GTID last_committed=512318 sequence_number=512319
SET @@SESSION.GTID_NEXT= 'd199707a-5287-11e7-8f35-246e96729f10:1215544883'/*!*/;
# at 379611141
#170707 20:28:38 server id 61481 end_log_pos 379611269 CRC32 0x19e0ec66 Query thread_id=758257 exec_time=59329 error_code=0
SET TIMESTAMP=1499430518/*!*/;
SET @@session.sql_mode=524288/*!*/;
XA START X'3230375f313533353431363835313835373138305f',X'353034',39187
/*!*/;
# at 379611269
#170707 20:28:38 server id 61481 end_log_pos 379611392 CRC32 0xd37354d4 Table_map: `XXXX`.`XXXX` mapped to number 123
# at 379611392
#170707 20:28:38 server id 61481 end_log_pos 379611910 CRC32 0xec28e0a5 Update_rows: table id 123 flags: STMT_END_F
### UPDATE XXXX.XXXX
### WHERE
### @1=473756624 /* LONGINT meta=0 nullable=0 is_null=0 */
### @2='挽情1111喜歡的音樂' /* VARSTRING(765) meta=765 nullable=1 is_null=0 */
### @3=338811606 /* LONGINT meta=0 nullable=0 is_null=0 */
### @4=1475019621016 /* LONGINT meta=0 nullable=1 is_null=0 */
### @5=1499148021048 /* LONGINT meta=0 nullable=1 is_null=0 */
### @6=0 /* LONGINT meta=0 nullable=1 is_null=0 */
### @7=114 /* INT meta=0 nullable=1 is_null=0 */
### SET
### @1=473756624 /* LONGINT meta=0 nullable=0 is_null=0 */
### @2='挽情1111喜歡的音樂' /* VARSTRING(765) meta=765 nullable=1 is_null=0 */```
### @3=338811606 /* LONGINT meta=0 nullable=0 is_null=0 */
### @4=1475019621016 /* LONGINT meta=0 nullable=1 is_null=0 */
### @5=1499430518207 /* LONGINT meta=0 nullable=1 is_null=0 */
### @6=0 /* LONGINT meta=0 nullable=1 is_null=0 */
### @7=115 /* INT meta=0 nullable=1 is_null=0 */
# at 379611910```
#170707 20:28:38 server id 61481 end_log_pos 379612036 CRC32 0x8f8d4a35 Query thread_id=758257 exec_time=59329 error_code=0
SET TIMESTAMP=1499430518/*!*/;
SET @@session.sql_mode=1436549152/*!*/;
XA END X'3230375f313533353431363835313835373138305f',X'353034',39187
/*!*/;
# at 379612036
#170707 20:28:38 server id 61481 end_log_pos 379612096 CRC32 0xd4966095 XA PREPARE X'3230375f313533353431363835313835373138305f',X'353034',39187
XA PREPARE X'3230375f313533353431363835313835373138305f',X'353034',39187
/*!*/;
# at 379614541
#170707 20:28:38 server id 61481 end_log_pos 379614687 CRC32 0x5273d888 Query thread_id=758257 exec_time=59329 error_code=0
SET TIMESTAMP=1499430518/*!*/;
SET @@session.sql_mode=1436549152/*!*/;
XA COMMIT X'3230375f313533353431363835313835373138305f',X'353034',39187
/*!*/;
看下本地的relay log,找到這個事務的gtid
# at 379611289
#170707 20:28:38 server id 61481 end_log_pos 379611141 CRC32 0x65240fde GTID last_committed=512318 sequence_number=512319
SET @@SESSION.GTID_NEXT= 'd199707a-5287-11e7-8f35-246e96729f10:1215544883'/*!*/;
# at 379611354
#170707 20:28:38 server id 61481 end_log_pos 379611269 CRC32 0x19e0ec66 Query thread_id=758257 exec_time=59329 error_code=0
SET TIMESTAMP=1499430518/*!*/;
SET @@session.sql_mode=524288/*!*/;
XA START X'3230375f313533353431363835313835373138305f',X'353034',39187
/*!*/;
# at 379611482
#170707 20:28:38 server id 61481 end_log_pos 379611392 CRC32 0xd37354d4 Table_map: XXXX.XXXX mapped to number 123
# at 379611605
#170707 20:28:38 server id 61481 end_log_pos 379611910 CRC32 0xec28e0a5 Update_rows: table id 123 flags: STMT_END_F
BINLOG '
dn5fWRMp8AAAewAAAABpoBYAAHsAAAAAAAEADGRiX3BsYXlsaXN0MQAOTXVzaWNfUGxheWxpc3QA
KQgPCAgICAMDAwgPAwMIAw8IAQMPCAEPAQMIAQEDCAEBDwMIAQQDAwMIDf0C/QL9AgAMdwGAAQT6
////vwHUVHPT
dn5fWRgp8AAABgIAAAZroBYAAHsAAAAAAAEAKf///////////////wCECAAA/tDzPBwAAAAAGQDm
jL3mg4UxMTEx5Zac5qyi55qE6Z+z5LmQ1toxFAAAAACYQgVuVwEAADhtLwxdAQAAAAAAAAAAAABy
AAAAAAAAAAAAAADhyREAAABCAAAAAAAFAAAA3gv6G10BAAABAAAAgQEAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAHAAAAAAAAAAAAAAAAAAAAAAADhtLwxdAQAAAAAgAGIyYTcxOWE4NjdmZjNkNmJlZWI5
YTRlZjI1NGVjNDliAAAAAAAAAAAAAAAAAGYGrEQAAAAAAAAAAAAAAAAAAAAAAAAAAACECAAA/tDz
PBwAAAAAGQDmjL3mg4UxMTEx5Zac5qyi55qE6Z+z5LmQ1toxFAAAAACYQgVuVwEAAL/9BR1dAQAA
AAAAAAAAAABzAAAAAAAAAAAAAADhyREAAABCAAAAAAAFAAAAv/0FHV0BAAABAAAAgQEAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAHEAAAAAAAAAAAAAAAAAAAAAAL/9BR1dAQAAAAAgAGIyYTcxOWE4Njdm
ZjNkNmJlZWI5YTRlZjI1NGVjNDliAAAAAAAAAAAAAAAAAGYGrEQAAAAAAAAAAAAAAAAAAAAAAAAA
AKXgKOw=
'/*!*/;
### UPDATE XXXX.XXXX
### WHERE
### @1=473756624 /* LONGINT meta=0 nullable=0 is_null=0 */
### @2='挽情1111喜歡的音樂' /* VARSTRING(765) meta=765 nullable=1 is_null=0 */
### @3=338811606 /* LONGINT meta=0 nullable=0 is_null=0 */
### @4=1475019621016 /* LONGINT meta=0 nullable=1 is_null=0 */
### @5=1499148021048 /* LONGINT meta=0 nullable=1 is_null=0 */
### @6=0 /* LONGINT meta=0 nullable=1 is_null=0 */
### @7=114 /* INT meta=0 nullable=1 is_null=0 */
### SET
### @1=473756624 /* LONGINT meta=0 nullable=0 is_null=0 */
### @2='挽情1111喜歡的音樂' /* VARSTRING(765) meta=765 nullable=1 is_null=0 */
### @3=338811606 /* LONGINT meta=0 nullable=0 is_null=0 */
### @4=1475019621016 /* LONGINT meta=0 nullable=1 is_null=0 */
### @5=1499430518207 /* LONGINT meta=0 nullable=1 is_null=0 */
### @6=0 /* LONGINT meta=0 nullable=1 is_null=0 */
### @7=115 /* INT meta=0 nullable=1 is_null=0 */
# at 379612123
#170707 20:28:38 server id 61481 end_log_pos 379612036 CRC32 0x8f8d4a35 Query thread_id=758257 exec_time=59329 error_code=0
SET TIMESTAMP=1499430518/*!*/;
SET @@session.sql_mode=1436549152/*!*/;
XA END X'3230375f313533353431363835313835373138305f',X'353034',39187
/*!*/;
# at 379612249
#170708 13:18:53 server id 254331 end_log_pos 379612306 CRC32 0xf2c69619 Rotate to db-music1-relay-bin.000003 pos: 4
ROLLBACK /* added by mysqlbinlog */ /*!*/;
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
到這里,這個relay log日志文件結束了。很顯然問題也找到了,就是執(zhí)行
ROLLBACK /* added by mysqlbinlog */ /*!*/;
出錯了。
首先 我們看到 這個rollback是MySQL自己加上去的,那么為什么要加呢?
mysql為了保證一個事務只在一個binlog里,所以當Binlog或者relay log發(fā)生截斷時,最后一個事務要么commit,要么rollback,如果rollback,那么下一個binlog或者relay log會把這個事務重做一遍,保證這個事務不會丟。
由于xa事務無法直接rollback,而需要xa rollback ‘XXX’,所以復制就停了。
怎么修復?是不是直接跳過這個rollback就行了?
我們來試一下,跳過這個rollback:
STOP SLAVE;
SET GTID_NEXT="d199707a-5287-11e7-8f35-246e96729f10:1215544883";
BEGIN; COMMIT;
SET GTID_NEXT="AUTOMATIC";
START SLAVE;
這里GTID_NEXT值不能用show slave status的里executed值,得用具體報錯的停止的gtid
但是,show slave status看到,還是有報錯:
Error 'XAER_NOTA: Unknown XID' on query. Default database: 'XXXX'. Query: 'XA COMMIT X'3230375f313533353431363835313835373138305f',X'353034',39187'
為什么又報這個事務commit找不到XID呢?
之前說過,在relaylog截斷的時候,如果事務沒有commit,會自動在最后加rollback,在下一個relay log開始的時候重新做一次這個事務,按理說我們跳過這個rollback,在下個relaylog會被重做,為啥會在commit的時候找不到xid呢?
我們看到我們跳過的gtid原來就是重做這個事務到PREPARE階段的gtid,原來,rollback是沒有gtid的,所以我們實際上就是把這個事務到PREPARE階段的gtid給跳過了,commit的時候肯定會找不到xid,接著怎么修復?
- 手動把這個事務執(zhí)行一遍,手動commit。
- 指定 ‘/!/;’為結束符,每個binlog開頭都有
BINLOG ' 8HFvWQ+8fQQAdwAAAHsAAAAAAAQANS43LjE4LTE1LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAXwAEGggAAAAICAgCAAAACgoKKioAEjQA AUvdRu4= '/*!*/;
- 設置gtid_next模式
SET @@SESSION.GTID_NEXT= 'AUTOMATIC’ ;
- 開始事務
XA START X'3230375f313533353431363835313835373138305f',X'353034',39187 /*!*/;
- 事務內(nèi)容
BINLOG ' dn5fWRMp8AAAewAAAABpoBYAAHsAAAAAAAEADGRiX3BsYXlsaXN0MQAOTXVzaWNfUGxheWxpc3QA KQgPCAgICAMDAwgPAwMIAw8IAQMPCAEPAQMIAQEDCAEBDwMIAQQDAwMIDf0C/QL9AgAMdwGAAQT6 ////vwHUVHPT dn5fWRgp8AAABgIAAAZroBYAAHsAAAAAAAEAKf///////////////wCECAAA/tDzPBwAAAAAGQDm jL3mg4UxMTEx5Zac5qyi55qE6Z+z5LmQ1toxFAAAAACYQgVuVwEAADhtLwxdAQAAAAAAAAAAAABy AAAAAAAAAAAAAADhyREAAABCAAAAAAAFAAAA3gv6G10BAAABAAAAgQEAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAHAAAAAAAAAAAAAAAAAAAAAAADhtLwxdAQAAAAAgAGIyYTcxOWE4NjdmZjNkNmJlZWI5 YTRlZjI1NGVjNDliAAAAAAAAAAAAAAAAAGYGrEQAAAAAAAAAAAAAAAAAAAAAAAAAAACECAAA/tDz PBwAAAAAGQDmjL3mg4UxMTEx5Zac5qyi55qE6Z+z5LmQ1toxFAAAAACYQgVuVwEAAL/9BR1dAQAA AAAAAAAAAABzAAAAAAAAAAAAAADhyREAAABCAAAAAAAFAAAAv/0FHV0BAAABAAAAgQEAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAHEAAAAAAAAAAAAAAAAAAAAAAL/9BR1dAQAAAAAgAGIyYTcxOWE4Njdm ZjNkNmJlZWI5YTRlZjI1NGVjNDliAAAAAAAAAAAAAAAAAGYGrEQAAAAAAAAAAAAAAAAAAAAAAAAA AKXgKOw= '/*!*/;
- 結束 & prepare事務
XA END X'3230375f313533353431363835313835373138305f',X'353034',39187 /*!*/; XA PREPARE X'3230375f313533353431363835313835373138305f',X'353034',39187 /*!*/;
- 提交事務
XA COMMIT X'3230375f313533353431363835313835373138305f',X'353034',39187 /*!*/;
- 指定 ‘/!/;’為結束符,每個binlog開頭都有
- 跳過commit這個gtid
- start slave
這樣,才算是完整的修復了這個主從復制,并且數(shù)據(jù)保持一致。
為什么這是5.7的一個bug呢?5.7之前的版本因為relaylog被截斷并不會出現(xiàn)這個bug。
5.7對xa事務的binlog記錄方式做了修改,把 xa start,xa end,xa prepare放到一個event里,xa commit又是另外一個event。而在之前的MySQL版本中,整個xa事務從start到commit都是在一個event中,所以其他版本并沒有問題。
最后一個問題:5.7為啥要把xa事務拆成兩個event?簡單的講是為了數(shù)據(jù)安全性。
5.5或者5.6假設下面一個場景:
- 主庫執(zhí)行 xa事務直到prepare
- kill主庫。
- 拉起主庫,再xa commit停機前的事務
主庫提交成功,修改的數(shù)據(jù)也是沒問題的,再去從庫,會發(fā)現(xiàn),從庫的數(shù)據(jù)并沒有變化。WHY?
MySQL在某個分布式事務prepare成功后宕機,宕機前操作該事務的連接并沒有斷開(如果在宕機前斷開連接,事務會被MySQL自動回滾),這個時候已經(jīng)prepare的事務并不會被回滾,所以在MySQL重新啟動后,引擎層通過recover機制能恢復該事務。當然該事務的Binlog已經(jīng)在宕機過程中被丟失,這個時候,如果去提交,則會造成主從數(shù)據(jù)的不一致, 即提交沒有記錄Binlog,從上丟失該條數(shù)據(jù)。
正因為5.7之前版本的xa事務存在這個bug,5.7后做了修復。從XA START到XA PREPARE之間的操作都被記錄到了Master的Binlog中,然后通過復制關系傳到了Slave上。也就是說5.7開始,MySQL對于XA事務,在prepare的時候就完成了寫B(tài)inlog的操作,通過新增一種叫 XA_prepare_log_event 的event類型來實現(xiàn)。
其實 MySQL5.7在xa事務上遠不止這個bug,后面再來慢慢總結。