環(huán)境信息
數(shù)據(jù)庫版本:
Release 20c 20.1.1 revision(31618)
集群信息
gSQL> select STATUS,LOCAL_MEMBER_NAME from x$instance;
STATUS LOCAL_MEMBER_NAME
------ -----------------
OPEN G2N2
OPEN G1N2
OPEN G2N1
OPEN G1N1
開啟歸檔模式
查詢集群所有節(jié)點是否開啟了歸檔。如果沒有開啟歸檔將無法執(zhí)行備份操作。開啟歸檔的方法是在每個節(jié)點內(nèi)把數(shù)據(jù)庫啟動到mount狀態(tài),執(zhí)行alter database archivelog,然后再打開數(shù)據(jù)庫。
寫了一個簡單腳本查詢所有節(jié)點的歸檔信息
[goldilocks@gs05 ~]$ sh select.sh "select ARCHIVELOG_MODE from v\$archivelog;"
192.168.149.131
ARCHIVELOG_MODE
---------------
ARCHIVELOG
192.168.149.132
ARCHIVELOG_MODE
---------------
ARCHIVELOG
192.168.149.133
ARCHIVELOG_MODE
---------------
ARCHIVELOG
192.168.149.134
ARCHIVELOG_MODE
---------------
ARCHIVELOG
全量備份
備份期間沒有數(shù)據(jù)變動
從最簡單的情況開始,假設(shè)備份的時候數(shù)據(jù)庫沒有數(shù)據(jù)變動,這樣直接復(fù)制整個數(shù)據(jù)庫文件就可以實現(xiàn)備份目的。
開始備份
創(chuàng)建一個測試表,插入一行數(shù)據(jù)
gSQL> create table t1 (id int,time timestamp);
Table created.
gSQL> insert into t1 values (100,sysdate);
1 row created.
gSQL> commit;
Commit complete.
gSQL> select * from t1;
ID TIME
--- --------------------------
100 2020-10-10 23:22:32.000000
1 row selected.
編寫備份腳本
#!/bin/bash
source ~/.bash_profile
echo "alter database begin backup;"|gsql --no-prompt
echo "begin backup"
ssh gs05 "source ~/.bash_profile;cp -r $GOLDILOCKS_DATA/* ~/backup" &
ssh gs06 "source ~/.bash_profile;cp -r $GOLDILOCKS_DATA/* ~/backup" &
ssh gs07 "source ~/.bash_profile;cp -r $GOLDILOCKS_DATA/* ~/backup" &
ssh gs08 "source ~/.bash_profile;cp -r $GOLDILOCKS_DATA/* ~/backup" &
wait
echo "alter database end backup;"|gsql --no-prompt
echo "end backup "
其中原理即是執(zhí)行alter database begin backup 后拷貝數(shù)據(jù)庫的所有的文件,包括數(shù)據(jù)文件,控制文件,以及wal路徑下的其他文件。
執(zhí)行完備份后,再插入一條測試數(shù)據(jù)
gSQL> insert into t1 values (200,sysdate);
1 row created.
gSQL> commit;
Commit complete.
gSQL> select * from t1;
ID TIME
--- --------------------------
100 2020-10-10 23:22:32.000000
200 2020-10-11 10:53:14.000000
2 rows selected.
恢復(fù)
模擬集群故障。
刪除集群所有節(jié)點數(shù)據(jù)文件,模擬最嚴(yán)重的情況。
[goldilocks@gs05 ~]$ run_cmd.sh "rm -r ~/goldilocks_data"
啟動報錯
[goldilocks@gs05 ~]$ gsql
Copyright ? 2010 SUNJESOFT Inc. All rights reserved.
Release 20c 20.1.1 revision(31618)
Connected to an idle instance.
gSQL> startup
ERR-HY000(13025): Property file does not exist (/home/goldilocks/goldilocks_data/conf/goldilocks.properties.conf)
gSQL>
開始恢復(fù)
將備份的數(shù)據(jù)庫文件復(fù)制到原位置
run_cmd.sh "mkdir ~/goldilocks_data"
run_cmd.sh "cp -r ~/backup/* ~/goldilocks_data"
[goldilocks@gs05 goldilocks_data]$ gsqlnet
Copyright ? 2010 SUNJESOFT Inc. All rights reserved.
Release 20c 20.1.1 revision(31618)
Connected to an idle instance.
gSQL> \cstartup
Startup success
gSQL> select * from x$instance;
VERSION STARTUP_TIME STATUS OS_USER_ID IS_CLUSTER LOCAL_GROUP_ID LOCAL_MEMBER_ID LOCAL_MEMBER_NAME LOCAL_MEMBER_POSITION
---------------------------------- -------------------------- ------ ---------- ---------- -------------- --------------- ----------------- ---------------------
Release 20c 20.1.1 revision(31618) 2020-10-11 11:06:37.650265 OPEN 1008 TRUE 1 1 G1N1 0
Release 20c 20.1.1 revision(31618) 2020-10-11 11:06:38.239340 OPEN 1007 TRUE 2 3 G2N2 2
Release 20c 20.1.1 revision(31618) 2020-10-11 11:06:38.238691 OPEN 1007 TRUE 2 5 G2N1 1
Release 20c 20.1.1 revision(31618) 2020-10-11 11:06:38.234501 OPEN 1005 TRUE 1 4 G1N2 3
4 rows selected.
gSQL> select * from t1;
ID TIME
--- --------------------------
100 2020-10-10 23:22:32.000000
1 row selected.
可以看到備份前的數(shù)據(jù)是存在的,備份后的數(shù)據(jù)不存在。
備份時有數(shù)據(jù)變動
數(shù)據(jù)庫支持online狀態(tài)備份,備份期間也允許用戶繼續(xù)使用數(shù)據(jù)庫。根據(jù)文檔理解,在執(zhí)行alter database begin backup 后,數(shù)據(jù)庫臟頁不會繼續(xù)刷到數(shù)據(jù)文件中了。即使手動執(zhí)行checkpoint,也不會刷臟頁。但redo 日志會繼續(xù)記錄和歸檔。
- 正常checkpoint流程
[2020-10-12 14:07:14.444515 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] begin
[2020-10-12 14:07:14.445851 INSTANCE(GOLDILOCKS) THREAD(62993,140466467100416)] [INFORMATION]
[IO SLAVE] flush datafile ( tablespace : 0, datafile : 0 )
[2020-10-12 14:07:14.474179 INSTANCE(GOLDILOCKS) THREAD(62993,140466467100416)] [INFORMATION]
[IO SLAVE] flush datafile ( tablespace : 1, datafile : 0 )
[2020-10-12 14:07:14.474493 INSTANCE(GOLDILOCKS) THREAD(62993,140466467100416)] [INFORMATION]
[IO SLAVE] flush datafile ( tablespace : 2, datafile : 0 )
[2020-10-12 14:07:14.474706 INSTANCE(GOLDILOCKS) THREAD(62993,140466467100416)] [INFORMATION]
[IO SLAVE] flush datafile ( tablespace : 2, datafile : 1 )
[2020-10-12 14:07:14.475220 INSTANCE(GOLDILOCKS) THREAD(62993,140466467100416)] [INFORMATION]
[IO SLAVE] flush datafile ( tablespace : 5, datafile : 0 )
[2020-10-12 14:07:14.476225 INSTANCE(GOLDILOCKS) THREAD(62993,140466467100416)] [INFORMATION]
[IO SLAVE] flush datafile ( tablespace : 6, datafile : 0 )
[2020-10-12 14:07:14.485063 INSTANCE(GOLDILOCKS) THREAD(62993,140466448217856)] [INFORMATION]
[CHECKPOINT] flush buffer checkpoint list[0] - for checkpoint (1), system min flushed lsn(262521), min flushed lsn (262543), flushed page count(0)
[2020-10-12 14:07:14.495049 INSTANCE(GOLDILOCKS) THREAD(62993,140466485982976)] [INFORMATION]
[PAGE FLUSHER] flushed lsn(262543), flushed page count(2048)]
[2020-10-12 14:07:14.495178 INSTANCE(GOLDILOCKS) THREAD(62993,140466051852032)] [INFORMATION]
[ARCHIVING] stable lsn(262543)
[2020-10-12 14:07:14.547931 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] begin - checkpoint lid(15.2915.13), checkpoint lsn(262544), oldest lsn(262544)
[2020-10-12 14:07:14.548092 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] body - checkpoint lid(-1.0.0), checkpoint lsn(-1), active transaction count(0)
[2020-10-12 14:07:14.548115 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] end - checkpoint lid(15.2915.117), checkpoint lsn(262545)
[2020-10-12 14:07:14.548126 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] flush redo log
[2020-10-12 14:07:14.552784 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] save control file
[2020-10-12 14:07:14.565377 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] end
- alter database begin backup后的chenkpoint流程
[2020-10-12 14:09:03.485033 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] begin
[2020-10-12 14:09:03.496058 INSTANCE(GOLDILOCKS) THREAD(62993,140466448217856)] [INFORMATION]
[CHECKPOINT] flush buffer checkpoint list[0] - for checkpoint (1), system min flushed lsn(262545), min flushed lsn (262547), flushed page count(0)
[2020-10-12 14:09:03.506912 INSTANCE(GOLDILOCKS) THREAD(62993,140466485982976)] [INFORMATION]
[PAGE FLUSHER] flushed lsn(262547), flushed page count(0)]
[2020-10-12 14:09:03.507426 INSTANCE(GOLDILOCKS) THREAD(62993,140466051852032)] [INFORMATION]
[ARCHIVING] stable lsn(262547)
[2020-10-12 14:09:03.532187 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] begin - checkpoint lid(15.2917.13), checkpoint lsn(262548), oldest lsn(262548)
[2020-10-12 14:09:03.532302 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] body - checkpoint lid(-1.0.0), checkpoint lsn(-1), active transaction count(0)
[2020-10-12 14:09:03.532318 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] end - checkpoint lid(15.2917.117), checkpoint lsn(262549)
[2020-10-12 14:09:03.532326 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] flush redo log
[2020-10-12 14:09:03.535894 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] save control file
[2020-10-12 14:09:03.589138 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] end
模擬數(shù)據(jù)有變動
用一個簡單腳本模擬一直有用戶使用數(shù)據(jù)庫
for i in {1..100}
do
echo "insert into t1 values ($i,sysdate);"|gsql sys gliese --no-prompt
sleep 1
done
在此期間備份數(shù)據(jù)庫
[goldilocks@gs05 ~]$ sh backup.sh
Database altered.
begin backup
Database altered.
end backup
插入數(shù)據(jù)腳本執(zhí)行完成后,t1表有100條數(shù)據(jù)
gSQL> select count(*) from t1;
COUNT(*)
--------
100
1 row selected.
此次備份過程中一直有數(shù)據(jù)寫入。整個備份包含開始備份前已經(jīng)落盤的數(shù)據(jù),和備份過程中新增的redo 日志數(shù)據(jù),由于redo日志是實時增加的,所以不能保證每個節(jié)點 的redo備份文件完全一致,恢復(fù)的時候需要做一些處理。
開始恢復(fù)
run_cmd.sh "mv ~/goldilocks_data/ ~/goldilocks_data_bak "
run_cmd.sh "cp -r ~/backup/ ~/goldilocks_data"
每個節(jié)點啟動監(jiān)聽后,用gsqlnet嘗試直接啟動集群
gSQL> cstartup
ERR-42000(16403): of the total '4' nodes, '1' nodes failed to join the global database
ERR-HY000(40061): currently connected node is inactive
Startup success
發(fā)現(xiàn)有一個節(jié)點沒有加入到集群
gSQL> select * from x$instance;
ERR-HY000(16354): connection of member 'G2N1' is broken
g2n1 沒有加入到集群
select a.member_name,a.member_id ,b.LOGICAL_CONNECTION,b.PHYSICAL_CONNECTION ,b.LOCAL_SCN from cluster_member@local a, x$cluster_member@local b where a.MEMBER_ID=b.MEMBER_ID order by a.GROUP_ID, 1;
MEMBER_NAME MEMBER_ID LOGICAL_CONNECTION PHYSICAL_CONNECTION LOCAL_SCN
----------- --------- ------------------ ------------------- ----------
G1N1 1 ACTIVE ACTIVE 853.26.489
G1N2 6 ACTIVE ACTIVE 853.26.0
G2N1 5 INACTIVE INACTIVE -1.-1.-1
G2N2 3 ACTIVE ACTIVE 853.36.0
查看g2n1的狀態(tài),g2n1 是local open狀態(tài),確實沒有加入到集群。
gSQL> select statement_view_scn() from dual;
STATEMENT_VIEW_SCN()
--------------------
853.24.1339
1 row selected.
gSQL> select * from x$instance@local;
VERSION STARTUP_TIME STATUS OS_USER_ID IS_CLUSTER LOCAL_GROUP_ID LOCAL_MEMBER_ID LOCAL_MEMBER_NAME LOCAL_MEMBER_POSITION
---------------------------------- -------------------------- ---------- ---------- ---------- -------------- --------------- ----------------- ---------------------
Release 20c 20.1.1 revision(31618) 2020-10-14 16:14:40.564452 LOCAL OPEN 1007 TRUE 2 5 G2N1 1
1 row selected.
G2N1 節(jié)點的scn是 853.24.1339 ,同一個組內(nèi)的G2N2節(jié)點853.36.1493 ,同組內(nèi)的dcn不同,組內(nèi)數(shù)據(jù)不一致,G2N2節(jié)點的數(shù)據(jù)多于G2N1節(jié)點,原因應(yīng)該是復(fù)制redo 日志文件時,兩個節(jié)點的redo文件內(nèi)容不完全一致。下面我們dump一下redo日志 驗證一下
-
G2N1 節(jié)點
[goldilocks@gs07 wal]$ gdump control control_1.ctl -s log Copyright ? 2010 SUNJESOFT Inc. All rights reserved. Release 20c 20.1.1 revision(31618) =========================================================== FILE: control_1.ctl TYPE: CONTROLFILE TIME: 2020-10-14 17:09:22.501152 =========================================================== [LOG SECTION] ----------------------------------------------------------- DATABASE CREATION TIME : 2020-09-30 11:06:01.643110 [CHECKPOINT] LID : 13,3564,13 LSN : 61523 RESETLOG LSN : -1 ARCHIVELOG MODE : ARCHIVELOG LAST INACTIVATED LOGFILE SEQUENCE : 12 [LOG STREAM] STATE : ACTIVE GROUP COUNT : 4 BLOCK SIZE : 512 FILE SEQUENCE : 13 [LOG GROUP #0] STATE : INACTIVE SIZE : 104857600 MEMBER COUNT : 1 FILE SEQUENCE : 12 PREV LAST LSN : 54716 MEMBER #0 : (ACTIVE) "/home/goldilocks/goldilocks_data/wal/redo_0_0.log" [LOG GROUP #1] STATE : CURRENT SIZE : 104857600 MEMBER COUNT : 1 FILE SEQUENCE : 13 PREV LAST LSN : 55104 MEMBER #0 : (ACTIVE) "/home/goldilocks/goldilocks_data/wal/redo_1_0.log" [LOG GROUP #2] STATE : INACTIVE SIZE : 104857600 MEMBER COUNT : 1 FILE SEQUENCE : 10 PREV LAST LSN : 48765 MEMBER #0 : (ACTIVE) "/home/goldilocks/goldilocks_data/wal/redo_2_0.log" [LOG GROUP #3] STATE : INACTIVE SIZE : 104857600 MEMBER COUNT : 1 FILE SEQUENCE : 11 PREV LAST LSN : 54713 MEMBER #0 : (ACTIVE) "/home/goldilocks/goldilocks_data/wal/redo_3_0.log" =========================================================== TIME: 2020-10-14 17:09:22.512727 ===========================================================
可以看到備份中的redo日志state為current的是redo_1_0.log文件,checkpoint位置是 61523
下面看一下檢查點后有多少redo記錄落盤
[goldilocks@gs07 wal]$ gdump log redo_1_0.log -n 61523 Copyright ? 2010 SUNJESOFT Inc. All rights reserved. Release 20c 20.1.1 revision(31618) =========================================================== FILE: redo_1_0.log TYPE: LOGFILE PAGE TIME: 2020-10-14 17:11:50.149410 =========================================================== =========================================================== [LOG FILE HEADER] ----------------------------------------------------------- LOG_GROUP_ID : 1 BLOCK_SIZE : 512 FILE_SIZE : 104857600 FILE_SEQUENCE : 13 PREV_LAST_LSN : 55104 CREATION TIME : 2020-10-11 22:07:12.290608 SIGNATURE : DEE8DD1A02C911EBA465E9A84E6E413C =========================================================== [LOG #0] : BLOCK(3564), LSN(61523), SIZE(64), PIECE_COUNT(1), TRANS_ID(FFFFFFFFFFFFFFFE), TRANS_SEQ(746), RID(0,-1,0) [PIECE #0] : TYPE(CHKPT_BEGIN), TIME(2020-10-12 14:25:53.233884), SIZE(48), CLASS(RECOVERY), REDO_TYPE(CONTROL_FILE), PROPAGATE_LOG(NO), RID(0,-1,0) 53F0000000000000 5503000000000000 1000000000000000 2D05000000000000 S....... U....... ........ -....... C802000011000000 DC6FCA5E73B10500 ........ .o.^s... [LOG #1] : BLOCK(3564), LSN(61524), SIZE(16), PIECE_COUNT(1), TRANS_ID(FFFFFFFFFFFFFFFE), TRANS_SEQ(0), RID(0,-1,0) [PIECE #0] : TYPE(CHKPT_END), SIZE(0), CLASS(RECOVERY), REDO_TYPE(CONTROL_FILE), PROPAGATE_LOG(NO), RID(0,-1,0) [LOG #2] : BLOCK(3565), LSN(61525), SIZE(561), PIECE_COUNT(7), TRANS_ID(83D0003003A), TRANS_SEQ(746), RID(0,0,1726) [PIECE #0] : TYPE(INIT_PAGE), SIZE(120), CLASS(PAGE_ACCESS), REDO_TYPE(PAGE), PROPAGATE_LOG(YES), RID(2,0,2790) 0D00030004000000 B4EF000000000000 D598FD2D7FB00500 02000400C30A0000 ........ ........ ...-.... ........ 00000000BE060000 0000000000000000 0000000000000000 0000000000000000 ........ ........ ........ ........ 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ........ ........ ........ ........ 0000000000000000 0000000000000000 02001000E60A0000 ........ ........ ........ [PIECE #1] : TYPE(BITMAP_UPDATE_LEAF_STATUS), SIZE(52), CLASS(SEGMENT), REDO_TYPE(PAGE), PROPAGATE_LOG(YES), RID(2,4,2755) 0400030000000000 0000000000000000 0000000000000000 0000000000000000 ........ ........ ........ ........ 0000000000000000 0000000000000000 00000080 ........ ........ .... [PIECE #2] : TYPE(BYTES), SIZE(4), CLASS(RECOVERY), REDO_TYPE(PAGE), PROPAGATE_LOG(YES), RID(2,4,2790) 04000300 .... [PIECE #3] : TYPE(INIT_PAGE_BODY), SIZE(42), CLASS(PAGE_ACCESS), REDO_TYPE(PAGE), PROPAGATE_LOG(YES), RID(2,120,2790) 1100A00010000408 0400F81F00000000 0100000000000000 0100000000000000 ........ ........ ........ ........ 0900000000000000 0100 ........ .. [PIECE #4] : TYPE(INSERT_TRANSACTION_RECORD), SIZE(128), CLASS(TRANSACTION), REDO_TYPE(UNDO), PROPAGATE_LOG(YES), RID(1,3,2109) 0200060967000000 670000006800FFFF 010003003D080000 3A0003003D080000 ....g... g...h... ....=... :...=... 00000000FFFFFFFF 0400E60200000000 FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF ........ ........ ........ ........ FFFFFFFFFFFFFFFF 5503000000000000 1000000000000000 2D05000000000000 ........ U....... ........ -....... FFFFFFFFFFFFFFFF 0000000000000000 0000000000000000 0000000000000000 ........ ........ ........ ........ [PIECE #5] : TYPE(INSERT_UNDO_RECORD), SIZE(24), CLASS(TRANSACTION), REDO_TYPE(UNDO), PROPAGATE_LOG(YES), RID(1,4,2109) 0000090A67000000 BE06000000001000 02000000E60A0000 ....g... ........ ........ [PIECE #6] : TYPE(HEAP_INSERT), SIZE(79), CLASS(TABLE), REDO_TYPE(PAGE), PROPAGATE_LOG(YES), RID(2,0,2790) 3A0003003D080000 5503000000000000 1000000000000000 2D05000000000000 :...=... U....... ........ -....... 0000000044010000 0000000000000002 1500000000000002 C1290880C01B97B7 ....D... ........ ........ .)...... D7F202EB8100001A 00000004000100 ........ ....... ....省略一些 ........ [LOG #16] : BLOCK(3579), LSN(61539), SIZE(561), PIECE_COUNT(7), TRANS_ID(83D0018003A), TRANS_SEQ(753), RID(0,0,1726) [PIECE #0] : TYPE(INIT_PAGE), SIZE(120), CLASS(PAGE_ACCESS), REDO_TYPE(PAGE), PROPAGATE_LOG(YES), RID(2,0,3238) 0D00030004000000 E5EF000000000000 D598FD2D7FB00500 02000400830C0000 ........ ........ ...-.... ........ 00000000BE060000 0000000000000000 0000000000000000 0000000000000000 ........ ........ ........ ........ 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ........ ........ ........ ........ 0000000000000000 0000000000000000 02001700A60C0000 ........ ........ ........ [PIECE #1] : TYPE(BITMAP_UPDATE_LEAF_STATUS), SIZE(52), CLASS(SEGMENT), REDO_TYPE(PAGE), PROPAGATE_LOG(YES), RID(2,4,3203) 0400030000000000 0000000000000000 0000000000000000 0000000000000000 ........ ........ ........ ........ 0000000000000000 0000000000000000 00000080 ........ ........ .... [PIECE #2] : TYPE(BYTES), SIZE(4), CLASS(RECOVERY), REDO_TYPE(PAGE), PROPAGATE_LOG(YES), RID(2,4,3238) 04000300 .... [PIECE #3] : TYPE(INIT_PAGE_BODY), SIZE(42), CLASS(PAGE_ACCESS), REDO_TYPE(PAGE), PROPAGATE_LOG(YES), RID(2,120,3238) 1100A00010000408 0400F81F00000000 0100000000000000 0100000000000000 ........ ........ ........ ........ 0900000000000000 0100 ........ .. [PIECE #4] : TYPE(INSERT_TRANSACTION_RECORD), SIZE(128), CLASS(TRANSACTION), REDO_TYPE(UNDO), PROPAGATE_LOG(YES), RID(1,24,2109) 0200060967000000 670000006800FFFF 010018003D080000 3A0018003D080000 ....g... g...h... ....=... :...=... 00000000FFFFFFFF 0400ED0200000000 FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF ........ ........ ........ ........ FFFFFFFFFFFFFFFF 5503000000000000 1700000000000000 2D05000000000000 ........ U....... ........ -....... FFFFFFFFFFFFFFFF 0000000000000000 0000000000000000 0000000000000000 ........ ........ ........ ........ [PIECE #5] : TYPE(INSERT_UNDO_RECORD), SIZE(24), CLASS(TRANSACTION), REDO_TYPE(UNDO), PROPAGATE_LOG(YES), RID(1,25,2109) 0000090A67000000 BE06000000001700 02000000A60C0000 ....g... ........ ........ [PIECE #6] : TYPE(HEAP_INSERT), SIZE(79), CLASS(TABLE), REDO_TYPE(PAGE), PROPAGATE_LOG(YES), RID(2,0,3238) 3A0018003D080000 5503000000000000 1700000000000000 2D05000000000000 :...=... U....... ........ -....... 0000000044010000 0000000000000002 1500000000000002 C13008C01DE297B7 ....D... ........ ........ .0...... D7F202F98100001A 00000004000100 ........ ....... [LOG #17] : BLOCK(3580), LSN(61540), SIZE(170), PIECE_COUNT(2), TRANS_ID(83D0018003A), TRANS_SEQ(753), RID(0,0,103) [PIECE #0] : TYPE(SEGMENT_GLOBAL_SCN), SIZE(34), CLASS(SEGMENT), REDO_TYPE(MULTIPLE_PAGE), PROPAGATE_LOG(NO), RID(0,-1,0) 5503000000000000 1800000000000000 2D05000000000000 010000000000BE06 U....... ........ -....... ........ 0000 .. [PIECE #1] : TYPE(COMMIT), TRANS_ID(83D0018003A), TIME(2020-10-12 14:26:07.269999), SIZE(104), CLASS(TRANSACTION), REDO_TYPE(TRANSACTION), PROPAGATE_LOG(YES), RID(1,24,2109) 3A0018003D080000 00000000FFFFFFFF 0400ED0200000000 5503000000000000 :...=... ........ ........ U....... 1800000000000000 2D05000000000000 5503000000000000 1700000000000000 ........ -....... U....... ........ 2D05000000000000 63F0000000000000 6F9CA05F73B10500 CF02000011000000 -....... c....... o.._s... ........ 0000000000000000 ........
共17行redo
G2N2 節(jié)點
[goldilocks@gs08 wal]$ gdump control control_1.ctl -s log
Copyright ? 2010 SUNJESOFT Inc. All rights reserved.
Release 20c 20.1.1 revision(31618)
===========================================================
FILE: control_1.ctl
TYPE: CONTROLFILE
TIME: 2020-10-14 17:20:34.265815
===========================================================
[LOG SECTION]
-----------------------------------------------------------
DATABASE CREATION TIME : 2020-09-27 08:51:16.608945
[CHECKPOINT]
LID : 15,3253,13
LSN : 272885
RESETLOG LSN : -1
ARCHIVELOG MODE : ARCHIVELOG
LAST INACTIVATED LOGFILE SEQUENCE : 14
[LOG STREAM]
STATE : ACTIVE
GROUP COUNT : 4
BLOCK SIZE : 512
FILE SEQUENCE : 15
[LOG GROUP #0]
STATE : INACTIVE
SIZE : 104857600
MEMBER COUNT : 1
FILE SEQUENCE : 12
PREV LAST LSN : 260402
MEMBER #0 : (ACTIVE) "/home/goldilocks/goldilocks_data/wal/redo_0_0.log"
[LOG GROUP #1]
STATE : INACTIVE
SIZE : 104857600
MEMBER COUNT : 1
FILE SEQUENCE : 13
PREV LAST LSN : 266354
MEMBER #0 : (ACTIVE) "/home/goldilocks/goldilocks_data/wal/redo_1_0.log"
[LOG GROUP #2]
STATE : INACTIVE
SIZE : 104857600
MEMBER COUNT : 1
FILE SEQUENCE : 14
PREV LAST LSN : 266357
MEMBER #0 : (ACTIVE) "/home/goldilocks/goldilocks_data/wal/redo_2_0.log"
[LOG GROUP #3]
STATE : CURRENT
SIZE : 104857600
MEMBER COUNT : 1
FILE SEQUENCE : 15
PREV LAST LSN : 266617
MEMBER #0 : (ACTIVE) "/home/goldilocks/goldilocks_data/wal/redo_3_0.log"
===========================================================
TIME: 2020-10-14 17:20:34.287068
===========================================================
redo_3_0.log為current日志,checkpoint位置是272885
[goldilocks@gs08 wal]$ gdump log redo_3_0.log -n 272885
Release 20c 20.1.1 revision(31618)
===========================================================
FILE: redo_3_0.log
TYPE: LOGFILE PAGE
TIME: 2020-10-14 17:22:07.267428
===========================================================
===========================================================
[LOG FILE HEADER]
-----------------------------------------------------------
LOG_GROUP_ID : 3
BLOCK_SIZE : 512
FILE_SIZE : 104857600
FILE_SEQUENCE : 15
PREV_LAST_LSN : 266617
CREATION TIME : 2020-10-11 22:07:12.247585
SIGNATURE : 8C9D5600005B11EB8554BF163FAF3B79
===========================================================
[LOG #0] : BLOCK(3253), LSN(272885), SIZE(64), PIECE_COUNT(1), TRANS_ID(FFFFFFFFFFFFFFFE), TRANS_SEQ(1537), RID(0,-1,0)
[PIECE #0] : TYPE(CHKPT_BEGIN), TIME(2020-10-12 14:25:53.411774), SIZE(48), CLASS(RECOVERY), REDO_TYPE(CONTROL_FILE), PROPAGATE_LOG(NO), RID(0,-1,0)
F529040000000000 5503000000000000 1000000000000000 C605000000000000 .)...... U....... ........ ........
F305000019000000 BE26CD5E73B10500 ........ .&.^s...
[LOG #1] : BLOCK(3253), LSN(272886), SIZE(16), PIECE_COUNT(1), TRANS_ID(FFFFFFFFFFFFFFFE), TRANS_SEQ(0), RID(0,-1,0)
[PIECE #0] : TYPE(CHKPT_END), SIZE(0), CLASS(RECOVERY), REDO_TYPE(CONTROL_FILE), PROPAGATE_LOG(NO), RID(0,-1,0)
[LOG #2] : BLOCK(3254), LSN(272887), SIZE(561), PIECE_COUNT(7), TRANS_ID(3A80031003A), TRANS_SEQ(1537), RID(0,0,1591)
[PIECE #0] : TYPE(INIT_PAGE), SIZE(120), CLASS(PAGE_ACCESS), REDO_TYPE(PAGE), PROPAGATE_LOG(YES), RID(2,0,2790)
0D00030004000000 5729040000000000 7D4AEDF240B00500 02000400C30A0000 ........ W)...... }J..@... ........
0000000037060000 0000000000000000 0000000000000000 0000000000000000 ....7... ........ ........ ........
0000000000000000 0000000000000000 0000000000000000 0000000000000000 ........ ........ ........ ........
0000000000000000 0000000000000000 02001000E60A0000 ........ ........ ........
[PIECE #1] : TYPE(BITMAP_UPDATE_LEAF_STATUS), SIZE(52), CLASS(SEGMENT), REDO_TYPE(PAGE), PROPAGATE_LOG(YES), RID(2,4,2755)
0400030000000000 0000000000000000 0000000000000000 0000000000000000 ........ ........ ........ ........
...
[LOG #41] : BLOCK(3293), LSN(272926), SIZE(170), PIECE_COUNT(2), TRANS_ID(3A8006A003A), TRANS_SEQ(1556), RID(0,0,103)
[PIECE #0] : TYPE(SEGMENT_GLOBAL_SCN), SIZE(34), CLASS(SEGMENT), REDO_TYPE(MULTIPLE_PAGE), PROPAGATE_LOG(NO), RID(0,-1,0)
5503000000000000 2400000000000000 C605000000000000 0100000000003706 U....... $....... ........ ......7.
0000 ..
[PIECE #1] : TYPE(COMMIT), TRANS_ID(3A8006A003A), TIME(2020-10-12 14:26:36.482417), SIZE(104), CLASS(TRANSACTION), REDO_TYPE(TRANSACTION), PROPAGATE_LOG(YES), RID(1,106,936)
3A006A00A8030000 00000000FFFFFFFF 0400050300000000 5503000000000000 :.j..... ........ ........ U.......
2400000000000000 C605000000000000 5503000000000000 2300000000000000 $....... ........ U....... #.......
C605000000000000 1D2A040000000000 715B5E6173B10500 0606000019000000 ........ .*...... q[^as... ........
0000000000000000 ........
G2N2 檢查點后有41行記錄,多于G2N1的記錄
接著我們驗證一下最后一條commit對應(yīng)的scn號
G2N1
[LOG #17] : BLOCK(3580), LSN(61540), SIZE(170), PIECE_COUNT(2), TRANS_ID(83D0018003A), TRANS_SEQ(753), RID(0,0,103)
[PIECE #0] : TYPE(SEGMENT_GLOBAL_SCN), SIZE(34), CLASS(SEGMENT), REDO_TYPE(MULTIPLE_PAGE), PROPAGATE_LOG(NO), RID(0,-1,0)
5503000000000000 1800000000000000 2D05000000000000 010000000000BE06 U....... ........ -....... ........
0000 ..
[PIECE #1] : TYPE(COMMIT), TRANS_ID(83D0018003A), TIME(2020-10-12 14:26:07.269999), SIZE(104), CLASS(TRANSACTION), REDO_TYPE(TRANSACTION), PROPAGATE_LOG(YES), RID(1,24,2109)
3A0018003D080000 00000000FFFFFFFF 0400ED0200000000 5503000000000000 :...=... ........ ........ U.......
1800000000000000 2D05000000000000 5503000000000000 1700000000000000 ........ -....... U....... ........
2D05000000000000 63F0000000000000 6F9CA05F73B10500 CF02000011000000 -....... c....... o.._s... ........
0000000000000000
G2N2
[LOG #41] : BLOCK(3293), LSN(272926), SIZE(170), PIECE_COUNT(2), TRANS_ID(3A8006A003A), TRANS_SEQ(1556), RID(0,0,103)
[PIECE #0] : TYPE(SEGMENT_GLOBAL_SCN), SIZE(34), CLASS(SEGMENT), REDO_TYPE(MULTIPLE_PAGE), PROPAGATE_LOG(NO), RID(0,-1,0)
5503000000000000 2400000000000000 C605000000000000 0100000000003706 U....... $....... ........ ......7.
0000 ..
[PIECE #1] : TYPE(COMMIT), TRANS_ID(3A8006A003A), TIME(2020-10-12 14:26:36.482417), SIZE(104), CLASS(TRANSACTION), REDO_TYPE(TRANSACTION), PROPAGATE_LOG(YES), RID(1,106,936)
3A006A00A8030000 00000000FFFFFFFF 0400050300000000 5503000000000000 :.j..... ........ ........ U.......
2400000000000000 C605000000000000 5503000000000000 2300000000000000 $....... ........ U....... #.......
C605000000000000 1D2A040000000000 715B5E6173B10500 0606000019000000 ........ .*...... q[^as... ........
0000000000000000
commit redo record size 是104 ,第49-56 byte記錄的是gcn(5503000000000000),后面每8個記錄dcn(1700000000000000)和lcn(2D05000000000000)
由于x86架構(gòu)cpu是內(nèi)存排序是低位排在前面,5503 轉(zhuǎn)換成10進制時應(yīng)該0355對應(yīng)的十進制為853,所以g2n1的gcn是853,dcn是23。g2n2的gcn
同樣是853,但dcn是35,比g2n1多12個組內(nèi)事務(wù)。也就是g2n2比g2n1多數(shù)據(jù)。
可以看出直接啟動集群后,G2N1節(jié)點 沒有順利加入到集群是因為group 2 組內(nèi)數(shù)據(jù)不一致,需要想辦法達(dá)成一致才可以。
有兩個思路,一、G2N2把比G2N1多的數(shù)據(jù)丟掉 ,二、G2N1補上缺少數(shù)據(jù)的數(shù)據(jù)。正常情況下,補全數(shù)據(jù)會更好些,畢竟數(shù)據(jù)是有價值的。本次實踐的話,兩種方案都嘗試一下
丟棄數(shù)據(jù)恢復(fù)方法
將每個節(jié)點單獨啟動到local open狀態(tài),檢查所有節(jié)點的gcn是否相同,組內(nèi)的dcn是否相同。
有過前面的操作知道group1 組內(nèi)的數(shù)據(jù)是一致的,group 2 組內(nèi)數(shù)據(jù)需要做到一致,并且G2N2的數(shù)據(jù)比G2N1多,這樣的話,將G2N2做不完全恢復(fù)到G2N1最后的redo 記錄即可。
查找G2N2需要恢復(fù)到的lsn號,即和G2N1最后一個commit相同的gcn+dcn組合(5503000000000000 1700000000000000)
[goldilocks@gs08 wal]$ gdump log redo_3_0.log -n 272885|grep -B 5 "5503000000000000 1700000000000000"
...
[LOG #17] : BLOCK(3269), LSN(272902), SIZE(170), PIECE_COUNT(2), TRANS_ID(3A80046003A), TRANS_SEQ(1544), RID(0,0,103)
[PIECE #0] : TYPE(SEGMENT_GLOBAL_SCN), SIZE(34), CLASS(SEGMENT), REDO_TYPE(MULTIPLE_PAGE), PROPAGATE_LOG(NO), RID(0,-1,0)
5503000000000000 1800000000000000 C605000000000000 0100000000003706 U....... ........ ........ ......7.
0000 ..
[PIECE #1] : TYPE(COMMIT), TRANS_ID(3A80046003A), TIME(2020-10-12 14:26:07.272916), SIZE(104), CLASS(TRANSACTION), REDO_TYPE(TRANSACTION), PROPAGATE_LOG(YES), RID(1,70,936)
3A004600A8030000 00000000FFFFFFFF 0400ED0200000000 5503000000000000 :.F..... ........ ........ U.......
1800000000000000 C605000000000000 5503000000000000 1700000000000000 ........ ........ U....... ........
查找出來后這一條是符合條件的,對應(yīng)的LSN是272902,所以G2N2需要恢復(fù)到lsn 272902
[goldilocks@gs08 wal]$ gsql
Copyright ? 2010 SUNJESOFT Inc. All rights reserved.
Release 20c 20.1.1 revision(31618)
Connected to an idle instance.
gSQL> startup mount
Startup success
gSQL> alter database recover until change 272902;
Database altered.
gSQL> alter system open local database resetlogs;
System altered.
gSQL> select statement_view_scn() from dual;
STATEMENT_VIEW_SCN()
--------------------
853.24.1492
1 row selected.
前面查到G2N1的scn是853.24.1339,此時group2 組的兩個節(jié)點數(shù)據(jù)已經(jīng)一致了。
此時4個節(jié)點是local open狀態(tài),且所有節(jié)點gcn是一致的,組內(nèi)dcn也是一致的,open global database 應(yīng)該會成功。
[goldilocks@gs05 ~]$ sh select.sh "select status from x\$instance@local;"|grep -v "^$"
192.168.149.131
STATUS
----------
LOCAL OPEN
1 row selected.
192.168.149.132
STATUS
----------
LOCAL OPEN
1 row selected.
192.168.149.133
STATUS
----------
LOCAL OPEN
1 row selected.
192.168.149.134
STATUS
----------
LOCAL OPEN
1 row selected.
[goldilocks@gs05 ~]$ sh select.sh "select statement_view_scn() from dual;"|grep -v "^$"
192.168.149.131
STATEMENT_VIEW_SCN()
--------------------
853.26.488
1 row selected.
192.168.149.132
STATEMENT_VIEW_SCN()
--------------------
853.26.1066
1 row selected.
192.168.149.133
STATEMENT_VIEW_SCN()
--------------------
853.24.1339
1 row selected.
192.168.149.134
STATEMENT_VIEW_SCN()
--------------------
853.24.1492
1 row selected.
g1n1 上執(zhí)行
gSQL> alter system open global database;
System altered.
gSQL> select LOCAL_MEMBER_NAME,STATUS from x$instance;
LOCAL_MEMBER_NAME STATUS
----------------- ------
G1N1 OPEN
G2N2 OPEN
G2N1 OPEN
G1N2 OPEN
4 rows selected.
gSQL> select count(*) from t1;
COUNT(*)
--------
50
1 row selected.
此時數(shù)據(jù)庫集群所有節(jié)點都從備份中恢復(fù)到正常狀態(tài)。
補全數(shù)據(jù)恢復(fù)方法
理論上將G2N2 當(dāng)前的redo 日志文件多于G2N1部分追加到G2N1的redo 日志中即可完成兩個節(jié)點數(shù)據(jù)一致性,但這個思路操作起來比較麻煩,我們采用另一種方式,重平衡數(shù)據(jù)的方式(rebalance database)
恢復(fù)環(huán)境到之前狀態(tài)
先關(guān)閉數(shù)據(jù)庫集群,然后清空當(dāng)前數(shù)據(jù)文件,復(fù)制備份數(shù)據(jù)文件到數(shù)據(jù)目錄
gSQL> \cshutdown
Shutdown success
run_cmd.sh "rm -r ~/goldilocks_data/*"
run_cmd.sh "cp -r ~/backup/* ~/goldilocks_data/"
嘗試一鍵啟動
gSQL> \cstartup
ERR-42000(16403): of the total '4' nodes, '1' nodes failed to join the global database
Startup success
發(fā)現(xiàn)有節(jié)點沒有加入到集群
gSQL> select a.member_name,a.member_id ,b.LOGICAL_CONNECTION,b.PHYSICAL_CONNECTION ,b.LOCAL_SCN from cluster_member@local a, x$cluster_member@local b where a.MEMBER_ID=b.MEMBER_ID order by a.GROUP_ID, 1;
MEMBER_NAME MEMBER_ID LOGICAL_CONNECTION PHYSICAL_CONNECTION LOCAL_SCN
----------- --------- ------------------ ------------------- ----------
G1N1 1 ACTIVE ACTIVE 853.26.489
G1N2 6 ACTIVE ACTIVE 853.26.0
G2N1 5 INACTIVE INACTIVE -1.-1.-1
G2N2 3 ACTIVE ACTIVE 853.36.0
4 rows selected.
G2N1 沒有加入到集群,嘗試手動將G2N1加入到集群
[goldilocks@gs07 ~]$ gsql
Copyright ? 2010 SUNJESOFT Inc. All rights reserved.
Release 20c 20.1.1 revision(31618)
Connected to GOLDILOCKS Database.
gSQL> select * from x$instance@local;
VERSION STARTUP_TIME STATUS OS_USER_ID IS_CLUSTER LOCAL_GROUP_ID LOCAL_MEMBER_ID LOCAL_MEMBER_NAME LOCAL_MEMBER_POSITION
---------------------------------- -------------------------- ---------- ---------- ---------- -------------- --------------- ----------------- ---------------------
Release 20c 20.1.1 revision(31618) 2020-10-15 21:25:14.099212 LOCAL OPEN 1007 TRUE 2 5 G2N1 1
1 row selected.
gSQL> alter system join database;
ERR-42000(16405): of the total '6' tables in the database, '3' tables need to be rebalanced
System altered.
gSQL> alter database rebalance;
Database altered.
gSQL> select * from x$instance;
VERSION STARTUP_TIME STATUS OS_USER_ID IS_CLUSTER LOCAL_GROUP_ID LOCAL_MEMBER_ID LOCAL_MEMBER_NAME LOCAL_MEMBER_POSITION
---------------------------------- -------------------------- ------ ---------- ---------- -------------- --------------- ----------------- ---------------------
Release 20c 20.1.1 revision(31618) 2020-10-15 21:25:14.099212 OPEN 1007 TRUE 2 5 G2N1 1
Release 20c 20.1.1 revision(31618) 2020-10-15 21:25:14.100691 OPEN 1007 TRUE 2 3 G2N2 2
Release 20c 20.1.1 revision(31618) 2020-10-15 21:25:14.099467 OPEN 1005 TRUE 1 6 G1N2 3
Release 20c 20.1.1 revision(31618) 2020-10-15 21:25:13.453708 OPEN 1008 TRUE 1 1 G1N1 0
4 rows selected.
gSQL> select count(*) from t1;
COUNT(*)
--------
62
1 row selected.
到此集群所有節(jié)點狀態(tài)恢復(fù)正常,且恢復(fù)出的數(shù)據(jù)多于上一種恢復(fù)方法。
小結(jié)
兩種方法比較起來看,第二種可以恢復(fù)更多的數(shù)據(jù),而且恢復(fù)操作更簡易些。
增量備份(級別備份)
除了支持全量備份外,goldilocks還支持增量備份,0級備份到4級備份
和backup有關(guān)的一些參數(shù)
gSQL> select property_name,property_value,init_value,is_deprecated from v$PROPERTY where PROPERTY_NAME like '%BACKUP%';
PROPERTY_NAME PROPERTY_VALUE INIT_VALUE IS_DEPRECATED
------------------------------------ --------------------------------------- --------------------------------------- -------------
BACKUP_DIR /home/goldilocks/goldilocks_data/backup /home/goldilocks/goldilocks_data/backup TRUE
DEFAULT_REMOVAL_OBSOLETE_BACKUP_LIST NO NO FALSE
DEFAULT_REMOVAL_BACKUP_FILE NO NO FALSE
READABLE_BACKUP_DIR_COUNT 1 1 FALSE
BACKUP_DIR_1 /home/goldilocks/goldilocks_data/backup /home/goldilocks/goldilocks_data/backup FALSE
BACKUP_DIR_2 /home/goldilocks/goldilocks_data/backup /home/goldilocks/goldilocks_data/backup FALSE
BACKUP_DIR_3 /home/goldilocks/goldilocks_data/backup /home/goldilocks/goldilocks_data/backup FALSE
BACKUP_DIR_4 /home/goldilocks/goldilocks_data/backup /home/goldilocks/goldilocks_data/backup FALSE
BACKUP_DIR_5 /home/goldilocks/goldilocks_data/backup /home/goldilocks/goldilocks_data/backup FALSE
BACKUP_DIR_6 /home/goldilocks/goldilocks_data/backup /home/goldilocks/goldilocks_data/backup FALSE
BACKUP_DIR_7 /home/goldilocks/goldilocks_data/backup /home/goldilocks/goldilocks_data/backup FALSE
BACKUP_DIR_8 /home/goldilocks/goldilocks_data/backup /home/goldilocks/goldilocks_data/backup FALSE
BACKUP_DIR_9 /home/goldilocks/goldilocks_data/backup /home/goldilocks/goldilocks_data/backup FALSE
BACKUP_DIR_10 /home/goldilocks/goldilocks_data/backup /home/goldilocks/goldilocks_data/backup FALSE
INCREMENTAL_BACKUP_SCAN_BUFFER_SIZE 32 32 FALSE
設(shè)置增量備份存放的路徑,BACKUP_DIR_1-BACKUP_DIR_10,(BACKUP_DIR 是廢棄參數(shù))。其中BACKUP_DIR_1不可以在session級別設(shè)置,system 級別設(shè)置需要重啟數(shù)據(jù)庫生效。默認(rèn)的BACKUP_DIR_1在${GOLDILOCKS_DATA}/backup 路徑下,正式系統(tǒng)通常需要單獨設(shè)置備份路徑。
alter system set BACKUP_DIR_1 ='/home/goldilocks/increment_backup' scope=file;
cshutdown;cstartup
gSQL> select property_name,property_value,init_value,is_deprecated from v$PROPERTY where PROPERTY_NAME = 'BACKUP_DIR_1';
PROPERTY_NAME PROPERTY_VALUE INIT_VALUE IS_DEPRECATED
------------- --------------------------------- --------------------------------- -------------
BACKUP_DIR_1 /home/goldilocks/increment_backup /home/goldilocks/increment_backup FALSE
0級備份
最基礎(chǔ)的級別備份,其他級別的備份依賴最基礎(chǔ)的0級備份
gSQL> alter database backup incremental level 0 ;
Database altered.
gSQL> select * from v$INCREMENTAL_BACKUP;
BACKUP_NAME BACKUP_SCOPE INCREMENTAL_LEVEL INCREMENTAL_TYPE LSN BEGIN_TIME COMPLETION_TIME
-------------------------------- ------------ ----------------- ---------------- ------ -------------------------- --------------------------
databaseD20201016T110117L0S2.inc database 0 N/A 263709 2020-10-16 11:01:17.014740 2020-10-16 11:01:20.208249
controlD20201016T110120L0S2.inc control 0 N/A 263709 2020-10-16 11:01:20.223898 2020-10-16 11:01:20.226171
2 rows selected.
查看一下具體生成的備份文件
[goldilocks@gs05 ~]$ run_cmd.sh "ls ~/increment_backup"
gs05
controlD20201016T110120L0S2.inc
databaseD20201016T110117L0S2.inc
----------------------------
gs06
controlD20201016T110118L0S1.inc
databaseD20201016T110116L0S1.inc
----------------------------
gs07
controlD20201016T110117L0S2.inc
databaseD20201016T110116L0S2.inc
----------------------------
gs08
controlD20201016T110121L0S2.inc
databaseD20201016T110116L0S2.inc
----------------------------
gdump工具查看一下備份文件的內(nèi)容
[goldilocks@gs05 increment_backup]$ gdump backup databaseD20201016T110117L0S2.inc
Copyright ? 2010 SUNJESOFT Inc. All rights reserved.
Release 20c 20.1.1 revision(31618)
===========================================================
FILE: databaseD20201016T110117L0S2.inc
TYPE: INCREMENTAL BACKUP
TIME: 2020-10-16 11:17:13.673234
===========================================================
INCREMENTAL BACKUP FILE HEADER
----------------------------------------------------------------------------
Backup object : DATABASE
Tablespace count : 6
Backup body size : 185942016
Last checkpoint lsn of previous incremental backup : 0
Max page lsn of incremental backup : 263709
Last checkpoint lsn of incremental backup : 263709
Last checkpoint lid of incremental backup : (15, 3710, 13)
Database signature : 691AA23C005B11EB89DBCBC21F1732DE
----------------------------------------------------------------------------
INCREMENTAL BACKUP FILE TAIL
----------------------------------------------------------------------------
Tablespace id(0) : backup page count(13022), start offset(8192)
Tablespace id(1) : backup page count(2122), start offset(106684416)
Tablespace id(2) : backup page count(2501), start offset(124067840)
Tablespace id(4) : backup page count(17), start offset(144556032)
Tablespace id(5) : backup page count(389), start offset(144695296)
Tablespace id(6) : backup page count(4647), start offset(147881984)
===========================================================
TIME: 2020-10-16 11:17:13.673502
可以看到,備份文件里包含表空間的一些信息和checkpoint的信息,沒看到commit.log ,location file的信息。
恢復(fù)
場景一:單個數(shù)據(jù)文件損壞
模擬故障
單個節(jié)點的數(shù)據(jù)文件損壞,不影響集群的整體使用,使用備份將單個節(jié)點恢復(fù)即可。
挑一個節(jié)點模擬故障,刪除某一個數(shù)據(jù)文件。然后模擬用戶繼續(xù)使用數(shù)據(jù)庫
gSQL> create table t2 (id int,name varchar(100)) sharding by hash (id ) shard count 2;
Table created.
gSQL> insert into t2 values (1,'after rm data file'),(2,'after rm data file');
1 row created.
gSQL> commit;
Commit complete.
數(shù)據(jù)庫集群是可以正常使用的,刪除數(shù)據(jù)文件的節(jié)點也沒有報錯,因為數(shù)據(jù)都存在內(nèi)存中,沒有落盤的動作,數(shù)據(jù)庫發(fā)現(xiàn)不了問題,接著我們手動觸發(fā)檢查點,出現(xiàn)報錯,(其他正常的節(jié)點checkpoint沒有問題)
gSQL> alter system checkpoint;
ERR-HY000(14106): tablespace (MEM_DATA_TBS) is taken offline as the result of a write error
ERR-HY000(11040): No such object (/home/goldilocks/goldilocks_data/db/system_data.dbf) : stfOpen() returned errno(2)
System altered.
gSQL> select IS_ONLINE from v$tablespace where TBS_NAME='MEM_DATA_TBS';
IS_ONLINE
---------
FALSE
gSQL> select count(*) from t1@local;
ERR-42000(14041): cannot access the OFFLINE tablespace 'MEM_DATA_TBS'
表空間已經(jīng)處于offline狀態(tài),查詢該節(jié)點數(shù)據(jù)出現(xiàn)錯誤。
恢復(fù)
啟動問題節(jié)點到mount狀態(tài)
gSQL> shutdown
Shutdown success
gSQL> startup mount
Startup success
然后正常節(jié)點模擬繼續(xù)使用數(shù)據(jù)庫
gSQL> insert into t2 values (3,'during g2n2 recovery'),(4,'during g2n2 recovery');
1 row created.
gSQL> commit;
Commit complete.
單個數(shù)據(jù)文件損壞,還原數(shù)據(jù)文件對應(yīng)的表空間即可
g2n2> alter database restore tablespace mem_data_tbs;
Database altered.
2020-10-19 09:45:27.604274 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[RESTORE] begin (TABLESPACE, -1)
[2020-10-19 09:45:27.612729 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[RESTORE] recreate datafile ( /home/goldilocks/goldilocks_data/db/system_data.dbf )
[2020-10-19 09:45:27.612763 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[DATABASE FILE MANAGER] remove database file (/home/goldilocks/goldilocks_data/db/system_data.dbf) - success
[2020-10-19 09:45:27.612773 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[DATABASE FILE MANAGER] register database file (/home/goldilocks/goldilocks_data/db/system_data.dbf) - success
[2020-10-19 09:45:27.728515 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[RESTORE] recreate end
[2020-10-19 09:45:27.733664 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[STARTUP-SM] LOAD DATAFILES
[2020-10-19 09:45:27.733726 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
.... datafile '/home/goldilocks/goldilocks_data/db/system_dict.dbf' assigned to PARALLEL_IO_GROUP_1
[2020-10-19 09:45:27.733741 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
.... datafile '/home/goldilocks/goldilocks_data/db/system_undo.dbf' assigned to PARALLEL_IO_GROUP_1
[2020-10-19 09:45:27.734024 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
.... datafile '/home/goldilocks/goldilocks_data/db/system_aux.dbf' assigned to PARALLEL_IO_GROUP_1
[2020-10-19 09:45:27.734269 INSTANCE(GOLDILOCKS) THREAD(19228,139887097394944)] [INFORMATION]
.... LOAD DATAFILE(/home/goldilocks/goldilocks_data/db/system_dict.dbf)
[2020-10-19 09:45:28.275547 INSTANCE(GOLDILOCKS) THREAD(19228,139887097394944)] [INFORMATION]
.... LOAD COMPLETED - DATAFILE(/home/goldilocks/goldilocks_data/db/system_dict.dbf)
[2020-10-19 09:45:28.275651 INSTANCE(GOLDILOCKS) THREAD(19228,139887097394944)] [INFORMATION]
.... LOAD DATAFILE(/home/goldilocks/goldilocks_data/db/system_undo.dbf)
[2020-10-19 09:45:28.312210 INSTANCE(GOLDILOCKS) THREAD(19228,139887097394944)] [INFORMATION]
.... LOAD COMPLETED - DATAFILE(/home/goldilocks/goldilocks_data/db/system_undo.dbf)
[2020-10-19 09:45:28.312297 INSTANCE(GOLDILOCKS) THREAD(19228,139887097394944)] [INFORMATION]
.... LOAD DATAFILE(/home/goldilocks/goldilocks_data/db/system_aux.dbf)
[2020-10-19 09:45:28.520626 INSTANCE(GOLDILOCKS) THREAD(19228,139887097394944)] [INFORMATION]
.... LOAD COMPLETED - DATAFILE(/home/goldilocks/goldilocks_data/db/system_aux.dbf)
[2020-10-19 09:45:28.522207 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[STARTUP-SM] REFINE TABLESPACE AND DATAFILE
[2020-10-19 09:45:28.637237 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
.... datafile '/home/goldilocks/goldilocks_data/db/system_data.dbf' assigned to PARALLEL_IO_GROUP_1
[2020-10-19 09:45:28.637435 INSTANCE(GOLDILOCKS) THREAD(19228,139887097394944)] [INFORMATION]
.... LOAD DATAFILE(/home/goldilocks/goldilocks_data/db/system_data.dbf)
[2020-10-19 09:45:29.103868 INSTANCE(GOLDILOCKS) THREAD(19228,139887097394944)] [INFORMATION]
.... LOAD COMPLETED - DATAFILE(/home/goldilocks/goldilocks_data/db/system_data.dbf)
[2020-10-19 09:45:29.131161 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[RESTORE] merge datafile begin ( backup file - /home/goldilocks/increment_backup/databaseD20201016T234607L0S2.inc )
[2020-10-19 09:45:29.133506 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[RESTORE] merge datafile end
[2020-10-19 09:45:29.177209 INSTANCE(GOLDILOCKS) THREAD(19228,139887116277504)] [INFORMATION]
[PAGE FLUSHER] FLUSHED_LSN(-1)]
[2020-10-19 09:45:29.178274 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[RESTORE] end
[2020-10-19 09:45:29.180503 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[EVENT] alter database restore : SUCCESS
resotre tablespace 應(yīng)該就是從level 0 備份中拿取對應(yīng)的備份文件,restore 到備份時的狀態(tài)。
控制文件中system_data.dbf 對應(yīng)的checkpoint是37384
NAME : MEM_DATA_TBS
ATTRIBUTES : MEMORY | PERSISTENT | DATA
STATE : CREATED
LOGGING STATE : LOGGING
ONLINE STATE : OFFLINE
EXTENT_SIZE : 32
RELATION_ID : 87467008983040
[DATAFILE #0]
SIZE : 209715200
AUTOEXTEN : OFF
NEXT : 0
MAXSIZE : 209715200
STATE : CREATED
NAME : "/home/goldilocks/goldilocks_data/db/system_data.dbf"
CHKPT LSN : 37384
CHKPT LID : (1, 1960, 13)
查看resotre 后system_data.dbf對應(yīng)的checkpoint 36395
[goldilocks@gs08 db]$ gdump data system_data.dbf -h
Copyright ? 2010 SUNJESOFT Inc. All rights reserved.
Release 20c 20.1.1 revision(31618)
===========================================================
FILE: system_data.dbf
TYPE: DATAFILE HEADER
TIME: 2020-10-19 09:59:21.891048
===========================================================
FILE : system_data.dbf
Tablespace Physical Id : 2
Datafile Id : 0
Last Checkpoint Lsn : 36395
Last Checkpoint Lid : (0, 59107, 13)
Creation TIME : 2020-10-16 23:25:36.324841
Database signature : D5A450520FC311EBA5B049B54ADBD1B3
===========================================================
TIME: 2020-10-19 09:59:21.891889
===========================================================
restore 后需要recover ,recover 需要從36395開始恢復(fù)到最新的redo
g2n2> alter database recover tablespace mem_data_tbs;
Database altered.
[2020-10-19 10:08:07.879322 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[STARTUP-SM] LOAD DATAFILES
[2020-10-19 10:08:07.880219 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[STARTUP-SM] REFINE TABLESPACE AND DATAFILE
[2020-10-19 10:08:07.905576 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
.... datafile '/home/goldilocks/goldilocks_data/db/system_data.dbf' assigned to PARALLEL_IO_GROUP_1
[2020-10-19 10:08:07.905694 INSTANCE(GOLDILOCKS) THREAD(19228,139887097394944)] [INFORMATION]
.... LOAD DATAFILE(/home/goldilocks/goldilocks_data/db/system_data.dbf)
[2020-10-19 10:08:08.012629 INSTANCE(GOLDILOCKS) THREAD(19228,139887097394944)] [INFORMATION]
.... LOAD COMPLETED - DATAFILE(/home/goldilocks/goldilocks_data/db/system_data.dbf)
[2020-10-19 10:08:08.058296 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[RECOVERY MANAGER] recover TABLESPACE - begin (COMPLETE, LSN 0)
[2020-10-19 10:08:08.058600 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[RECOVERY MANAGER] analysis begin
[2020-10-19 10:08:08.164847 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[RECOVERY MANAGER] read checkpoint log - checkpoint lid(0.59107.13), oldest lsn(36395), local scn(903.0.1046), time(2020-10-16 23:46:07.893176), transaction sequence(532), Grid sequence(12884902414)
[2020-10-19 10:08:08.191845 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[RECOVERY MANAGER] analysis done
[2020-10-19 10:08:08.191909 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[RECOVERY MANAGER] ready to redo - start lid(0.59107.0), start lsn(36395)
[2020-10-19 10:08:08.199126 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[RECOVERY MANAGER] redo has performing - logfile(/home/goldilocks/goldilocks_data/archive_log/archive_0.log), lsn(36395)
[2020-10-19 10:08:08.205028 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[RECOVERY MANAGER] redo has performing - logfile(/home/goldilocks/goldilocks_data/wal/redo_1_0.log), lsn(36397)
[2020-10-19 10:08:08.209624 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[RECOVERY MANAGER] recover TABLESPACE - end
[2020-10-19 10:08:08.252564 INSTANCE(GOLDILOCKS) THREAD(19228,139887116277504)] [INFORMATION]
[PAGE FLUSHER] FLUSHED_LSN(-1)]
[2020-10-19 10:08:08.260469 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[EVENT] alter database recover : SUCCESS
recoer 后開啟數(shù)據(jù)庫,將表空間修改為online
g2n2> alter system open local database;
System altered.
g2n2> select IS_ONLINE from v$tablespace where TBS_NAME='MEM_DATA_TBS';
IS_ONLINE
---------
FALSE
1 row selected.
g2n2> alter tablespace MEM_DATA_TBS online;
Tablespace altered.
g2n2> select IS_ONLINE from v$tablespace where TBS_NAME='MEM_DATA_TBS';
IS_ONLINE
---------
TRUE
1 row selected.
g2n2> select count(*) from t1@local;
COUNT(*)
--------
24
1 row selected.
g2n2> select * from t2@local;
ID NAME
-- ------------------
1 after rm data file
1 row selected.
可以發(fā)現(xiàn),t1的數(shù)據(jù)已經(jīng)恢復(fù)出來了,t2 的數(shù)據(jù)有一條。為什么有一條呢,因為數(shù)據(jù)文件損壞,但redo日志已經(jīng)記錄了數(shù)據(jù)庫操作,前面建t2表時指定shard count 2,是配合后面同時插入兩條測試數(shù)據(jù)時一定有一條數(shù)據(jù)落入到出故障的節(jié)點。
為什么只有一條數(shù)據(jù)呢,因為還沒有加入到集群中,恢復(fù)期間集群的數(shù)據(jù)還沒有同步過來。
此時g2n2還沒加入到集群中
g2n2> select statement_view_scn() from dual;
STATEMENT_VIEW_SCN()
--------------------
923.0.1101
g1n1> select statement_view_scn() from dual;
STATEMENT_VIEW_SCN()
--------------------
925.1.590
g2n2 已經(jīng)比集群落后了2個全局事務(wù)
將g2n2 加入到集群
g2n2> alter system join database;
ERR-42000(16405): of the total '11' tables in the database, '1' tables need to be rebalanced
System altered.
g2n2> alter database rebalance;
Database altered.
g2n2> select * from t2@local order by 1;
ID NAME
-- --------------------
1 after rm data file
3 during g2n2 recovery
2 rows selected.
g2n2> select * from t2 order by 1;
ID NAME
-- --------------------
1 after rm data file
2 after rm data file
3 during g2n2 recovery
4 during g2n2 recovery
g2n2> select statement_view_scn() from dual;
STATEMENT_VIEW_SCN()
--------------------
934.0.1101
1 row selected.
集群其他節(jié)點的scn
g1n1> select statement_view_scn() from dual;
STATEMENT_VIEW_SCN()
--------------------
934.0.590
1 row selected.
加入集群后,新的數(shù)據(jù)通過rebalance 同步過來了。
場景:控制文件損壞
控制文件默認(rèn)是兩個副本,存放在相同的路徑下$GOLDILOCK_DATA/wal,放在相同路徑下冗余效果并不好,可以設(shè)置放在不同的路徑下
alter system set CONTROL_FILE_1 ='/home/goldilocks/goldilocks_data/backup/control_1.ctl' scope =file;
關(guān)閉數(shù)據(jù)庫,復(fù)制文件,然后重啟數(shù)據(jù)庫即可
run_cmd.sh "cp ~/goldilocks_data/wal/control_1.ctl ~/goldilocks_data/backup/control_1.ctl"
g1n1> select PROPERTY_VALUE from v$PROPERTY where property_name like '%CONTROL_FILE_1%';
PROPERTY_VALUE
-----------------------------------------------------
/home/goldilocks/goldilocks_data/backup/control_1.ctl
1 row selected.
模擬控制文件丟失
rm /home/goldilocks/goldilocks_data/backup/control_1.ctl
隨機
g1n1> alter system checkpoint;
System altered.
缺少控制文件的節(jié)點直接宕了
[2020-10-19 14:51:34.611741 INSTANCE(GOLDILOCKS) THREAD(23155,139815192352512)] [INFORMATION]
[CHECKPOINT] begin
[2020-10-19 14:51:34.612304 INSTANCE(GOLDILOCKS) THREAD(23155,139814663874304)] [INFORMATION]
[IO SLAVE] flush datafile ( tablespace : 0, datafile : 0 )
[2020-10-19 14:51:34.640728 INSTANCE(GOLDILOCKS) THREAD(23155,139814663874304)] [INFORMATION]
[IO SLAVE] flush datafile ( tablespace : 1, datafile : 0 )
[2020-10-19 14:51:34.640876 INSTANCE(GOLDILOCKS) THREAD(23155,139814663874304)] [INFORMATION]
[IO SLAVE] flush datafile ( tablespace : 2, datafile : 0 )
[2020-10-19 14:51:34.641185 INSTANCE(GOLDILOCKS) THREAD(23155,139814663874304)] [INFORMATION]
[IO SLAVE] flush datafile ( tablespace : 5, datafile : 0 )
[2020-10-19 14:51:34.641577 INSTANCE(GOLDILOCKS) THREAD(23155,139814663874304)] [INFORMATION]
[IO SLAVE] flush datafile ( tablespace : 6, datafile : 0 )
[2020-10-19 14:51:34.646512 INSTANCE(GOLDILOCKS) THREAD(23155,139814644991744)] [INFORMATION]
[CHECKPOINT] flush buffer checkpoint list[0] - for checkpoint (1), system min flushed lsn(41666), min flushed lsn (41709), flushed page count(0)
[2020-10-19 14:51:34.656874 INSTANCE(GOLDILOCKS) THREAD(23155,139815154587392)] [INFORMATION]
[PAGE FLUSHER] flushed lsn(41709), flushed page count(1024)]
[2020-10-19 14:51:34.656972 INSTANCE(GOLDILOCKS) THREAD(23155,139814317840128)] [INFORMATION]
[ARCHIVING] stable lsn(41709)
[2020-10-19 14:51:34.667498 INSTANCE(GOLDILOCKS) THREAD(23155,139814317840128)] [FATAL]
[SYSTEM FATAL] CAUSE("can't save control file")(GOLDILOCKS) ERR-HY000(11040): No such object (/home/goldilocks/goldilocks_data/backup/control_1.ctl): stfOpen() returned errno(2)
========================================================
(GOLDILOCKS) CALL STACK [Release 20c 20.1.1 revision(31618)]
========================================================
gmaster(kngAddErrorCallStackUnsafe+0xa3) [0xb60353]
gmaster(knlLogCallStackUnsafe+0x2a) [0xb3a13a]
gmaster(knlSystemFatal+0x35) [0xb475c5]
gmaster(smfSaveCtrlFile+0x76) [0xa6e066]
gmaster(smrArchivelogEventHandler+0x38) [0x9c5448]
gmaster(knlExecuteEnvEvent+0xb0) [0xb46740]
gmaster(ztmtArchivelogThread+0x320) [0x520700]
/lib64/libpthread.so.0(+0x7dd5) [0x7f2950d02dd5]
/lib64/libc.so.6(clone+0x6d) [0x7f295082402d]
(GOLDILOCKS) ERR-HY000(11040): No such object (/home/goldilocks/goldilocks_data/backup/control_1.ctl): stfOpen() returned errno(2)
復(fù)制正常的控制文件到丟失的位置,重啟數(shù)據(jù)庫,然后加入到集群即可恢復(fù)。
如果兩份控制文件都損壞,出現(xiàn)這種情況一般不是軟件造成的了。人為誤刪除整個目錄或者服務(wù)器文件系統(tǒng)損壞的可能性更大。這樣的話,需要整個節(jié)點的恢復(fù)了,參考單個節(jié)點數(shù)據(jù)全部丟失情況
場景二:單個節(jié)點數(shù)據(jù)全部丟失
誤刪除整個數(shù)據(jù)庫目錄,服務(wù)器文件系統(tǒng)損壞,多塊磁盤故障等會導(dǎo)致所有的數(shù)據(jù)丟失。
有兩個思路
使用0級備份恢復(fù)
適用場景:備份比較新,恢復(fù)的代價小于重新加入集群的代價,且歸檔一直沒有丟失。
模擬故障
刪除掉整個數(shù)據(jù)文件夾
rm -rf ~/goldilocks_data
g1n1> insert into t2 values (5,'after rm all g1n2 data file'),(6,'after rm all g1n2 data file');
2 rows created.
g1n1> commit;
Commit complete.
恢復(fù)
[goldilocks@gs06 ~]$ gsql
Copyright ? 2010 SUNJESOFT Inc. All rights reserved.
Release 20c 20.1.1 revision(31618)
Connected to an idle instance.
g1n2> startup nomount;
ERR-HY000(13025): Property file does not exist (/home/goldilocks/goldilocks_data/conf/goldilocks.properties.conf)
沒有參數(shù)文件,從其他正好節(jié)點復(fù)制conf 文件夾過來
g1n2> startup nomount;
ERR-HY000(13004): static shared memory segment is corrupted
ERR-HY000(11040): No such object (/home/goldilocks/goldilocks_data/trc/system.trc): stfOpen() returned errno(2)
在$GOLDILOCKS_DATA 路徑下創(chuàng)建 db,wal,trc,backup,archive_log 文件夾
mkdir wal db trc backup archive_log
g1n2> startup nomount;
Startup success
g1n2> alter system mount database;
ERR-42000(14046): file does not exist - '/home/goldilocks/goldilocks_data/wal/control_0.ctl'
g1n2> alter database restore controlfile from '/home/goldilocks/increment_backup/controlD20201019T160304L0S4.inc';
Database altered.
g1n2> alter system mount database;
ERR-42000(14046): file does not exist - '/home/goldilocks/goldilocks_data/wal/commit.log'
commit log作用:MEM_TRANS_TBS 表空間用滿時,將多出來的transaction record 存在commit log中。恢復(fù)過程中不需要考慮這個文件
,制作一個空文件即可
dd if=/dev/zero of=commit.log bs=1048576 count=100
g1n2> alter system mount database;
ERR-HY000(11040): No such object (/home/goldilocks/goldilocks_data/wal/location.ctl) : stfOpen() returned errno(2)
location.ctl 文件不存在,從其他節(jié)點復(fù)制一個
g1n2> alter system mount database;
System altered.
g1n2> alter database restore;
Database altered.
g1n2> alter database recover;
ERR-42000(14046): file does not exist - '/home/goldilocks/goldilocks_data/wal/redo_0_0.log'
查看當(dāng)前日志文件
g1n2> select * from v$logfile;
GROUP_ID FILE_NAME GROUP_STATE FILE_SEQ FILE_SIZE
-------- ------------------------------------------------- ----------- -------- ---------
0 /home/goldilocks/goldilocks_data/wal/redo_0_0.log CURRENT 4 104857600
1 /home/goldilocks/goldilocks_data/wal/redo_1_0.log INACTIVE 1 104857600
2 /home/goldilocks/goldilocks_data/wal/redo_2_0.log INACTIVE 2 104857600
3 /home/goldilocks/goldilocks_data/wal/redo_3_0.log INACTIVE 3 104857600
4 rows selected.
根據(jù)歸檔日志,恢復(fù)redo日志
g1n2> ! cp /home/goldilocks/goldilocks_data/archive_log/archive_4.log /home/goldilocks/goldilocks_data/wal/redo_0_0.log
g1n2> ! cp /home/goldilocks/goldilocks_data/archive_log/archive_1.log /home/goldilocks/goldilocks_data/wal/redo_1_0.log
g1n2> ! cp /home/goldilocks/goldilocks_data/archive_log/archive_2.log /home/goldilocks/goldilocks_data/wal/redo_2_0.log
g1n2> ! cp /home/goldilocks/goldilocks_data/archive_log/archive_3.log /home/goldilocks/goldilocks_data/wal/redo_3_0.log
g1n2> alter database begin incomplete recovery;
ERR-01000(14104): Warning: suggestion '/home/goldilocks/goldilocks_data/archive_log/archive_4.log'
ERR-01000(14103): Warning: media recovery needs a logfile including log (Lsn 45702)
Database altered.
g1n2> alter database recover automatically;
ERR-01000(14104): Warning: suggestion '/home/goldilocks/goldilocks_data/archive_log/archive_5.log'
ERR-01000(14103): Warning: media recovery needs a logfile including log (Lsn 46421)
Database altered.
g1n2> alter database end incomplete recovery;
Database altered.
g1n2> alter system open local database resetlogs;
System altered.
g1n2> alter system join database;
ERR-42000(16405): of the total '11' tables in the database, '10' tables need to be rebalanced
System altered.
g1n2> alter database rebalance;
Database altered.
g1n2> select * from t2 order by 1;
ID NAME
-- ---------------------------
1 after rm data file
2 after rm data file
3 during g2n2 recovery
4 during g2n2 recovery
5 after rm all g1n2 data file
6 after rm all g1n2 data file
6 rows selected.
恢復(fù)完成
g1n2> select a.member_name,a.member_id ,b.LOGICAL_CONNECTION,b.PHYSICAL_CONNECTION ,b.LOCAL_SCN from cluster_member@local a, x$cluster_member@local b where a.MEMBER_ID=b.MEMBER_ID order by a.GROUP_ID, 1;
MEMBER_NAME MEMBER_ID LOGICAL_CONNECTION PHYSICAL_CONNECTION LOCAL_SCN
----------- --------- ------------------ ------------------- -----------
G1N1 1 ACTIVE ACTIVE 1138.0.0
G1N2 6 ACTIVE ACTIVE 1138.0.1264
G2N1 5 ACTIVE ACTIVE 1138.0.0
G2N2 7 ACTIVE ACTIVE 1138.0.0
4 rows selected.
新成員加入集群
適用場景:備份和數(shù)據(jù)一起丟失了,沒有途徑恢復(fù)數(shù)據(jù)了。
g2n2 上刪除數(shù)據(jù)庫文件
rm archive_log/* backup/* db/* wal/*
查看集群狀態(tài)
g1n1> select a.member_name,a.member_id ,b.LOGICAL_CONNECTION,b.PHYSICAL_CONNECTION ,b.LOCAL_SCN from cluster_member@local a, x$cluster_member@local b where a.MEMBER_ID=b.MEMBER_ID order by a.GROUP_ID, 1;
MEMBER_NAME MEMBER_ID LOGICAL_CONNECTION PHYSICAL_CONNECTION LOCAL_SCN
----------- --------- ------------------ ------------------- ----------
G1N1 1 ACTIVE ACTIVE 1140.0.674
G1N2 6 INACTIVE INACTIVE -1.-1.-1
G2N1 5 ACTIVE ACTIVE 1140.0.0
G2N2 7 ACTIVE ACTIVE 1140.0.0
g2n2 重新創(chuàng)建數(shù)據(jù)庫
[goldilocks@gs06 goldilocks_data]$ gcreatedb --cluster --db_name='glodilocks' --timezone='+08:00' --character_set="UTF8" --char_length_units="OCTETS" --member='g1n2' --host=192.168.149.132 --port=10120
Copyright ? 2010 SUNJESOFT Inc. All rights reserved.
Release 20c 20.1.1 revision(31618)
Database created
g1n1> alter database drop inactive cluster members;
Database altered.
g1n1> select a.member_name,a.member_id ,b.LOGICAL_CONNECTION,b.PHYSICAL_CONNECTION ,b.LOCAL_SCN from cluster_member@local a, x$cluster_member@local b where a.MEMBER_ID=b.MEMBER_ID order by a.GROUP_ID, 1;
MEMBER_NAME MEMBER_ID LOGICAL_CONNECTION PHYSICAL_CONNECTION LOCAL_SCN
----------- --------- ------------------ ------------------- ----------
G1N1 1 ACTIVE ACTIVE 1142.0.674
G2N1 5 ACTIVE ACTIVE 1142.0.0
G2N2 7 ACTIVE ACTIVE 1142.0.0
3 rows selected.
新節(jié)點
g1n2> startup
Startup success
g1n2> alter system open global database;
System altered.
將節(jié)點加入到集群
g1n1> alter cluster group g1 add cluster member g1n2 host '192.168.149.132' port 10120;
Cluster Group altered.
新節(jié)點需要rebalance 數(shù)據(jù)
g1n2> alter database rebalance;
Database altered.
g1n2> select * from t2@local order by 1;
ID NAME
-- ---------------------------
2 after rm data file
4 during g2n2 recovery
6 after rm all g1n2 data file
3 rows selected.
g1n1> select a.member_name,a.member_id ,b.LOGICAL_CONNECTION,b.PHYSICAL_CONNECTION ,b.LOCAL_SCN from cluster_member@local a, x$cluster_member@local b where a.MEMBER_ID=b.MEMBER_ID order by a.GROUP_ID, 1;
MEMBER_NAME MEMBER_ID LOGICAL_CONNECTION PHYSICAL_CONNECTION LOCAL_SCN
----------- --------- ------------------ ------------------- ----------
G1N1 1 ACTIVE ACTIVE 1155.0.675
G1N2 8 ACTIVE ACTIVE 1155.0.0
G2N1 5 ACTIVE ACTIVE 1155.0.0
G2N2 7 ACTIVE ACTIVE 1155.0.0
集群恢復(fù)正常。