GOLDILOCKS 分布式數(shù)據(jù)庫備份恢復(fù)實踐

環(huán)境信息

數(shù)據(jù)庫版本:

Release 20c 20.1.1 revision(31618)

集群信息

gSQL> select STATUS,LOCAL_MEMBER_NAME from x$instance;
STATUS LOCAL_MEMBER_NAME
------ -----------------
OPEN   G2N2             
OPEN   G1N2             
OPEN   G2N1             
OPEN   G1N1        

開啟歸檔模式

查詢集群所有節(jié)點是否開啟了歸檔。如果沒有開啟歸檔將無法執(zhí)行備份操作。開啟歸檔的方法是在每個節(jié)點內(nèi)把數(shù)據(jù)庫啟動到mount狀態(tài),執(zhí)行alter database archivelog,然后再打開數(shù)據(jù)庫。

寫了一個簡單腳本查詢所有節(jié)點的歸檔信息

[goldilocks@gs05 ~]$ sh select.sh "select ARCHIVELOG_MODE from v\$archivelog;"
192.168.149.131

ARCHIVELOG_MODE
---------------
ARCHIVELOG   

192.168.149.132

ARCHIVELOG_MODE
---------------
ARCHIVELOG     

192.168.149.133

ARCHIVELOG_MODE
---------------
ARCHIVELOG     

192.168.149.134

ARCHIVELOG_MODE
---------------
ARCHIVELOG     

全量備份

備份期間沒有數(shù)據(jù)變動

從最簡單的情況開始,假設(shè)備份的時候數(shù)據(jù)庫沒有數(shù)據(jù)變動,這樣直接復(fù)制整個數(shù)據(jù)庫文件就可以實現(xiàn)備份目的。

開始備份

創(chuàng)建一個測試表,插入一行數(shù)據(jù)

gSQL> create table t1 (id int,time timestamp);
Table created.
gSQL> insert into t1 values (100,sysdate);
1 row created.
gSQL> commit;
Commit complete.
gSQL> select * from t1;
 ID TIME                      
--- --------------------------
100 2020-10-10 23:22:32.000000
1 row selected.

編寫備份腳本

#!/bin/bash
source ~/.bash_profile
echo "alter database begin backup;"|gsql  --no-prompt

echo "begin backup"

ssh gs05 "source ~/.bash_profile;cp -r $GOLDILOCKS_DATA/* ~/backup" &
ssh gs06 "source ~/.bash_profile;cp -r $GOLDILOCKS_DATA/* ~/backup" &
ssh gs07 "source ~/.bash_profile;cp -r $GOLDILOCKS_DATA/* ~/backup" &
ssh gs08 "source ~/.bash_profile;cp -r $GOLDILOCKS_DATA/* ~/backup" &

wait
echo "alter database end  backup;"|gsql  --no-prompt

echo "end backup "

其中原理即是執(zhí)行alter database begin backup 后拷貝數(shù)據(jù)庫的所有的文件,包括數(shù)據(jù)文件,控制文件,以及wal路徑下的其他文件。

執(zhí)行完備份后,再插入一條測試數(shù)據(jù)

gSQL> insert into t1 values (200,sysdate);

1 row created.

gSQL> commit;

Commit complete.

gSQL> select * from t1;

 ID TIME                      
--- --------------------------
100 2020-10-10 23:22:32.000000
200 2020-10-11 10:53:14.000000

2 rows selected.

恢復(fù)

模擬集群故障。

刪除集群所有節(jié)點數(shù)據(jù)文件,模擬最嚴(yán)重的情況。

[goldilocks@gs05 ~]$ run_cmd.sh "rm -r ~/goldilocks_data"

啟動報錯

[goldilocks@gs05 ~]$ gsql

 Copyright ? 2010 SUNJESOFT Inc. All rights reserved.
 Release 20c 20.1.1 revision(31618)


Connected to an idle instance.

gSQL> startup

ERR-HY000(13025): Property file does not exist (/home/goldilocks/goldilocks_data/conf/goldilocks.properties.conf)

gSQL> 

開始恢復(fù)

將備份的數(shù)據(jù)庫文件復(fù)制到原位置

run_cmd.sh "mkdir ~/goldilocks_data"
run_cmd.sh "cp -r  ~/backup/* ~/goldilocks_data"
[goldilocks@gs05 goldilocks_data]$ gsqlnet

 Copyright ? 2010 SUNJESOFT Inc. All rights reserved.
 Release 20c 20.1.1 revision(31618)


Connected to an idle instance.

gSQL> \cstartup

Startup success

gSQL> select * from x$instance;

VERSION                            STARTUP_TIME               STATUS OS_USER_ID IS_CLUSTER LOCAL_GROUP_ID LOCAL_MEMBER_ID LOCAL_MEMBER_NAME LOCAL_MEMBER_POSITION
---------------------------------- -------------------------- ------ ---------- ---------- -------------- --------------- ----------------- ---------------------
Release 20c 20.1.1 revision(31618) 2020-10-11 11:06:37.650265 OPEN         1008 TRUE                    1               1 G1N1                                  0
Release 20c 20.1.1 revision(31618) 2020-10-11 11:06:38.239340 OPEN         1007 TRUE                    2               3 G2N2                                  2
Release 20c 20.1.1 revision(31618) 2020-10-11 11:06:38.238691 OPEN         1007 TRUE                    2               5 G2N1                                  1
Release 20c 20.1.1 revision(31618) 2020-10-11 11:06:38.234501 OPEN         1005 TRUE                    1               4 G1N2                                  3

4 rows selected.

gSQL> select * from t1;

 ID TIME                      
--- --------------------------
100 2020-10-10 23:22:32.000000

1 row selected.

可以看到備份前的數(shù)據(jù)是存在的,備份后的數(shù)據(jù)不存在。

備份時有數(shù)據(jù)變動

數(shù)據(jù)庫支持online狀態(tài)備份,備份期間也允許用戶繼續(xù)使用數(shù)據(jù)庫。根據(jù)文檔理解,在執(zhí)行alter database begin backup 后,數(shù)據(jù)庫臟頁不會繼續(xù)刷到數(shù)據(jù)文件中了。即使手動執(zhí)行checkpoint,也不會刷臟頁。但redo 日志會繼續(xù)記錄和歸檔。

  • 正常checkpoint流程
[2020-10-12 14:07:14.444515 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] begin

[2020-10-12 14:07:14.445851 INSTANCE(GOLDILOCKS) THREAD(62993,140466467100416)] [INFORMATION]
[IO SLAVE] flush datafile ( tablespace : 0, datafile : 0 )

[2020-10-12 14:07:14.474179 INSTANCE(GOLDILOCKS) THREAD(62993,140466467100416)] [INFORMATION]
[IO SLAVE] flush datafile ( tablespace : 1, datafile : 0 )

[2020-10-12 14:07:14.474493 INSTANCE(GOLDILOCKS) THREAD(62993,140466467100416)] [INFORMATION]
[IO SLAVE] flush datafile ( tablespace : 2, datafile : 0 )

[2020-10-12 14:07:14.474706 INSTANCE(GOLDILOCKS) THREAD(62993,140466467100416)] [INFORMATION]
[IO SLAVE] flush datafile ( tablespace : 2, datafile : 1 )

[2020-10-12 14:07:14.475220 INSTANCE(GOLDILOCKS) THREAD(62993,140466467100416)] [INFORMATION]
[IO SLAVE] flush datafile ( tablespace : 5, datafile : 0 )

[2020-10-12 14:07:14.476225 INSTANCE(GOLDILOCKS) THREAD(62993,140466467100416)] [INFORMATION]
[IO SLAVE] flush datafile ( tablespace : 6, datafile : 0 )

[2020-10-12 14:07:14.485063 INSTANCE(GOLDILOCKS) THREAD(62993,140466448217856)] [INFORMATION]
[CHECKPOINT] flush buffer checkpoint list[0] - for checkpoint (1), system min flushed lsn(262521), min flushed lsn (262543), flushed page count(0)

[2020-10-12 14:07:14.495049 INSTANCE(GOLDILOCKS) THREAD(62993,140466485982976)] [INFORMATION]
[PAGE FLUSHER] flushed lsn(262543), flushed page count(2048)]

[2020-10-12 14:07:14.495178 INSTANCE(GOLDILOCKS) THREAD(62993,140466051852032)] [INFORMATION]
[ARCHIVING] stable lsn(262543)

[2020-10-12 14:07:14.547931 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] begin - checkpoint lid(15.2915.13), checkpoint lsn(262544), oldest lsn(262544)

[2020-10-12 14:07:14.548092 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] body - checkpoint lid(-1.0.0), checkpoint lsn(-1), active transaction count(0)

[2020-10-12 14:07:14.548115 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] end - checkpoint lid(15.2915.117), checkpoint lsn(262545)

[2020-10-12 14:07:14.548126 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] flush redo log

[2020-10-12 14:07:14.552784 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] save control file

[2020-10-12 14:07:14.565377 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] end

  • alter database begin backup后的chenkpoint流程
[2020-10-12 14:09:03.485033 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] begin

[2020-10-12 14:09:03.496058 INSTANCE(GOLDILOCKS) THREAD(62993,140466448217856)] [INFORMATION]
[CHECKPOINT] flush buffer checkpoint list[0] - for checkpoint (1), system min flushed lsn(262545), min flushed lsn (262547), flushed page count(0)

[2020-10-12 14:09:03.506912 INSTANCE(GOLDILOCKS) THREAD(62993,140466485982976)] [INFORMATION]
[PAGE FLUSHER] flushed lsn(262547), flushed page count(0)]

[2020-10-12 14:09:03.507426 INSTANCE(GOLDILOCKS) THREAD(62993,140466051852032)] [INFORMATION]
[ARCHIVING] stable lsn(262547)

[2020-10-12 14:09:03.532187 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] begin - checkpoint lid(15.2917.13), checkpoint lsn(262548), oldest lsn(262548)

[2020-10-12 14:09:03.532302 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] body - checkpoint lid(-1.0.0), checkpoint lsn(-1), active transaction count(0)

[2020-10-12 14:09:03.532318 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] end - checkpoint lid(15.2917.117), checkpoint lsn(262549)

[2020-10-12 14:09:03.532326 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] flush redo log

[2020-10-12 14:09:03.535894 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] save control file

[2020-10-12 14:09:03.589138 INSTANCE(GOLDILOCKS) THREAD(62993,140466936956672)] [INFORMATION]
[CHECKPOINT] end

模擬數(shù)據(jù)有變動

用一個簡單腳本模擬一直有用戶使用數(shù)據(jù)庫

for i in {1..100}
do
 echo "insert into t1 values ($i,sysdate);"|gsql sys gliese --no-prompt
 sleep 1
done

在此期間備份數(shù)據(jù)庫

[goldilocks@gs05 ~]$ sh backup.sh 

Database altered.

begin backup

Database altered.

end backup 

插入數(shù)據(jù)腳本執(zhí)行完成后,t1表有100條數(shù)據(jù)

gSQL> select count(*) from t1;

COUNT(*)
--------
     100

1 row selected.

此次備份過程中一直有數(shù)據(jù)寫入。整個備份包含開始備份前已經(jīng)落盤的數(shù)據(jù),和備份過程中新增的redo 日志數(shù)據(jù),由于redo日志是實時增加的,所以不能保證每個節(jié)點 的redo備份文件完全一致,恢復(fù)的時候需要做一些處理。

開始恢復(fù)

run_cmd.sh "mv  ~/goldilocks_data/ ~/goldilocks_data_bak "
run_cmd.sh "cp -r  ~/backup/ ~/goldilocks_data"

每個節(jié)點啟動監(jiān)聽后,用gsqlnet嘗試直接啟動集群

gSQL> cstartup
ERR-42000(16403): of the total '4' nodes, '1' nodes failed to join the global database
ERR-HY000(40061): currently connected node is inactive
Startup success

發(fā)現(xiàn)有一個節(jié)點沒有加入到集群

gSQL> select * from x$instance;

ERR-HY000(16354): connection of member 'G2N1' is broken

g2n1 沒有加入到集群

select a.member_name,a.member_id ,b.LOGICAL_CONNECTION,b.PHYSICAL_CONNECTION ,b.LOCAL_SCN from cluster_member@local a, x$cluster_member@local b where a.MEMBER_ID=b.MEMBER_ID order by a.GROUP_ID, 1;

MEMBER_NAME MEMBER_ID LOGICAL_CONNECTION PHYSICAL_CONNECTION LOCAL_SCN 
----------- --------- ------------------ ------------------- ----------
G1N1                1 ACTIVE             ACTIVE              853.26.489
G1N2                6 ACTIVE             ACTIVE              853.26.0  
G2N1                5 INACTIVE           INACTIVE            -1.-1.-1  
G2N2                3 ACTIVE             ACTIVE              853.36.0  

查看g2n1的狀態(tài),g2n1 是local open狀態(tài),確實沒有加入到集群。

gSQL> select statement_view_scn() from dual;

STATEMENT_VIEW_SCN()
--------------------
853.24.1339         

1 row selected.

gSQL>  select * from x$instance@local;

VERSION                            STARTUP_TIME               STATUS     OS_USER_ID IS_CLUSTER LOCAL_GROUP_ID LOCAL_MEMBER_ID LOCAL_MEMBER_NAME LOCAL_MEMBER_POSITION
---------------------------------- -------------------------- ---------- ---------- ---------- -------------- --------------- ----------------- ---------------------
Release 20c 20.1.1 revision(31618) 2020-10-14 16:14:40.564452 LOCAL OPEN       1007 TRUE                    2               5 G2N1                                  1

1 row selected.

G2N1 節(jié)點的scn是 853.24.1339 ,同一個組內(nèi)的G2N2節(jié)點853.36.1493 ,同組內(nèi)的dcn不同,組內(nèi)數(shù)據(jù)不一致,G2N2節(jié)點的數(shù)據(jù)多于G2N1節(jié)點,原因應(yīng)該是復(fù)制redo 日志文件時,兩個節(jié)點的redo文件內(nèi)容不完全一致。下面我們dump一下redo日志 驗證一下

  • G2N1 節(jié)點

    [goldilocks@gs07 wal]$ gdump control control_1.ctl -s  log
    
     Copyright ? 2010 SUNJESOFT Inc. All rights reserved.
     Release 20c 20.1.1 revision(31618)
    
    ===========================================================
    FILE: control_1.ctl
    TYPE: CONTROLFILE
    TIME: 2020-10-14 17:09:22.501152
    ===========================================================
    
     [LOG SECTION]
    -----------------------------------------------------------
       DATABASE CREATION TIME : 2020-09-30 11:06:01.643110
    
      [CHECKPOINT]
        LID                               :  13,3564,13
        LSN                               :  61523
        RESETLOG LSN                      :  -1
        ARCHIVELOG MODE                   :  ARCHIVELOG
        LAST INACTIVATED LOGFILE SEQUENCE :  12
    
      [LOG STREAM]
        STATE          :  ACTIVE
        GROUP COUNT    :  4
        BLOCK SIZE     :  512
        FILE SEQUENCE  :  13
    
        [LOG GROUP #0]
          STATE         : INACTIVE
          SIZE          : 104857600
          MEMBER COUNT  : 1
          FILE SEQUENCE : 12
          PREV LAST LSN : 54716
          MEMBER #0     : (ACTIVE) "/home/goldilocks/goldilocks_data/wal/redo_0_0.log"
    
        [LOG GROUP #1]
          STATE         : CURRENT
          SIZE          : 104857600
          MEMBER COUNT  : 1
          FILE SEQUENCE : 13
          PREV LAST LSN : 55104
          MEMBER #0     : (ACTIVE) "/home/goldilocks/goldilocks_data/wal/redo_1_0.log"
    
        [LOG GROUP #2]
          STATE         : INACTIVE
          SIZE          : 104857600
          MEMBER COUNT  : 1
          FILE SEQUENCE : 10
          PREV LAST LSN : 48765
          MEMBER #0     : (ACTIVE) "/home/goldilocks/goldilocks_data/wal/redo_2_0.log"
    
        [LOG GROUP #3]
          STATE         : INACTIVE
          SIZE          : 104857600
          MEMBER COUNT  : 1
          FILE SEQUENCE : 11
          PREV LAST LSN : 54713
          MEMBER #0     : (ACTIVE) "/home/goldilocks/goldilocks_data/wal/redo_3_0.log"
    
    ===========================================================
    TIME: 2020-10-14 17:09:22.512727
    ===========================================================
    
    

    可以看到備份中的redo日志state為current的是redo_1_0.log文件,checkpoint位置是 61523

    下面看一下檢查點后有多少redo記錄落盤

    [goldilocks@gs07 wal]$ gdump log redo_1_0.log -n  61523
    
     Copyright ? 2010 SUNJESOFT Inc. All rights reserved.
     Release 20c 20.1.1 revision(31618)
    
    ===========================================================
    FILE: redo_1_0.log
    TYPE: LOGFILE PAGE
    TIME: 2020-10-14 17:11:50.149410
    ===========================================================
    
    ===========================================================
     [LOG FILE HEADER]
    -----------------------------------------------------------
     LOG_GROUP_ID    : 1
     BLOCK_SIZE      : 512
     FILE_SIZE       : 104857600
     FILE_SEQUENCE   : 13
     PREV_LAST_LSN   : 55104
     CREATION TIME   : 2020-10-11 22:07:12.290608
     SIGNATURE       : DEE8DD1A02C911EBA465E9A84E6E413C
    ===========================================================
    
    [LOG #0] : BLOCK(3564), LSN(61523), SIZE(64), PIECE_COUNT(1), TRANS_ID(FFFFFFFFFFFFFFFE), TRANS_SEQ(746), RID(0,-1,0)
    [PIECE #0] : TYPE(CHKPT_BEGIN), TIME(2020-10-12 14:25:53.233884), SIZE(48), CLASS(RECOVERY), REDO_TYPE(CONTROL_FILE), PROPAGATE_LOG(NO), RID(0,-1,0)
     53F0000000000000 5503000000000000 1000000000000000 2D05000000000000    S....... U....... ........ -.......
     C802000011000000 DC6FCA5E73B10500                                      ........ .o.^s...                  
    
    [LOG #1] : BLOCK(3564), LSN(61524), SIZE(16), PIECE_COUNT(1), TRANS_ID(FFFFFFFFFFFFFFFE), TRANS_SEQ(0), RID(0,-1,0)
    [PIECE #0] : TYPE(CHKPT_END), SIZE(0), CLASS(RECOVERY), REDO_TYPE(CONTROL_FILE), PROPAGATE_LOG(NO), RID(0,-1,0)
    
    [LOG #2] : BLOCK(3565), LSN(61525), SIZE(561), PIECE_COUNT(7), TRANS_ID(83D0003003A), TRANS_SEQ(746), RID(0,0,1726)
    [PIECE #0] : TYPE(INIT_PAGE), SIZE(120), CLASS(PAGE_ACCESS), REDO_TYPE(PAGE), PROPAGATE_LOG(YES), RID(2,0,2790)
     0D00030004000000 B4EF000000000000 D598FD2D7FB00500 02000400C30A0000    ........ ........ ...-.... ........
     00000000BE060000 0000000000000000 0000000000000000 0000000000000000    ........ ........ ........ ........
     0000000000000000 0000000000000000 0000000000000000 0000000000000000    ........ ........ ........ ........
     0000000000000000 0000000000000000 02001000E60A0000                     ........ ........ ........         
    [PIECE #1] : TYPE(BITMAP_UPDATE_LEAF_STATUS), SIZE(52), CLASS(SEGMENT), REDO_TYPE(PAGE), PROPAGATE_LOG(YES), RID(2,4,2755)
     0400030000000000 0000000000000000 0000000000000000 0000000000000000    ........ ........ ........ ........
     0000000000000000 0000000000000000 00000080                             ........ ........ ....             
    [PIECE #2] : TYPE(BYTES), SIZE(4), CLASS(RECOVERY), REDO_TYPE(PAGE), PROPAGATE_LOG(YES), RID(2,4,2790)
     04000300                                                               ....                               
    [PIECE #3] : TYPE(INIT_PAGE_BODY), SIZE(42), CLASS(PAGE_ACCESS), REDO_TYPE(PAGE), PROPAGATE_LOG(YES), RID(2,120,2790)
     1100A00010000408 0400F81F00000000 0100000000000000 0100000000000000    ........ ........ ........ ........
     0900000000000000 0100                                                  ........ ..                        
    [PIECE #4] : TYPE(INSERT_TRANSACTION_RECORD), SIZE(128), CLASS(TRANSACTION), REDO_TYPE(UNDO), PROPAGATE_LOG(YES), RID(1,3,2109)
     0200060967000000 670000006800FFFF 010003003D080000 3A0003003D080000    ....g... g...h... ....=... :...=...
     00000000FFFFFFFF 0400E60200000000 FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF    ........ ........ ........ ........
     FFFFFFFFFFFFFFFF 5503000000000000 1000000000000000 2D05000000000000    ........ U....... ........ -.......
     FFFFFFFFFFFFFFFF 0000000000000000 0000000000000000 0000000000000000    ........ ........ ........ ........
    [PIECE #5] : TYPE(INSERT_UNDO_RECORD), SIZE(24), CLASS(TRANSACTION), REDO_TYPE(UNDO), PROPAGATE_LOG(YES), RID(1,4,2109)
     0000090A67000000 BE06000000001000 02000000E60A0000                     ....g... ........ ........         
    [PIECE #6] : TYPE(HEAP_INSERT), SIZE(79), CLASS(TABLE), REDO_TYPE(PAGE), PROPAGATE_LOG(YES), RID(2,0,2790)
     3A0003003D080000 5503000000000000 1000000000000000 2D05000000000000    :...=... U....... ........ -.......
     0000000044010000 0000000000000002 1500000000000002 C1290880C01B97B7    ....D... ........ ........ .)......
     D7F202EB8100001A 00000004000100                                        ........ .......                   
    
    ....省略一些
                                                          ........                           
    
    [LOG #16] : BLOCK(3579), LSN(61539), SIZE(561), PIECE_COUNT(7), TRANS_ID(83D0018003A), TRANS_SEQ(753), RID(0,0,1726)
    [PIECE #0] : TYPE(INIT_PAGE), SIZE(120), CLASS(PAGE_ACCESS), REDO_TYPE(PAGE), PROPAGATE_LOG(YES), RID(2,0,3238)
     0D00030004000000 E5EF000000000000 D598FD2D7FB00500 02000400830C0000    ........ ........ ...-.... ........
     00000000BE060000 0000000000000000 0000000000000000 0000000000000000    ........ ........ ........ ........
     0000000000000000 0000000000000000 0000000000000000 0000000000000000    ........ ........ ........ ........
     0000000000000000 0000000000000000 02001700A60C0000                     ........ ........ ........         
    [PIECE #1] : TYPE(BITMAP_UPDATE_LEAF_STATUS), SIZE(52), CLASS(SEGMENT), REDO_TYPE(PAGE), PROPAGATE_LOG(YES), RID(2,4,3203)
     0400030000000000 0000000000000000 0000000000000000 0000000000000000    ........ ........ ........ ........
     0000000000000000 0000000000000000 00000080                             ........ ........ ....             
    [PIECE #2] : TYPE(BYTES), SIZE(4), CLASS(RECOVERY), REDO_TYPE(PAGE), PROPAGATE_LOG(YES), RID(2,4,3238)
     04000300                                                               ....                               
    [PIECE #3] : TYPE(INIT_PAGE_BODY), SIZE(42), CLASS(PAGE_ACCESS), REDO_TYPE(PAGE), PROPAGATE_LOG(YES), RID(2,120,3238)
     1100A00010000408 0400F81F00000000 0100000000000000 0100000000000000    ........ ........ ........ ........
     0900000000000000 0100                                                  ........ ..                        
    [PIECE #4] : TYPE(INSERT_TRANSACTION_RECORD), SIZE(128), CLASS(TRANSACTION), REDO_TYPE(UNDO), PROPAGATE_LOG(YES), RID(1,24,2109)
     0200060967000000 670000006800FFFF 010018003D080000 3A0018003D080000    ....g... g...h... ....=... :...=...
     00000000FFFFFFFF 0400ED0200000000 FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF    ........ ........ ........ ........
     FFFFFFFFFFFFFFFF 5503000000000000 1700000000000000 2D05000000000000    ........ U....... ........ -.......
     FFFFFFFFFFFFFFFF 0000000000000000 0000000000000000 0000000000000000    ........ ........ ........ ........
    [PIECE #5] : TYPE(INSERT_UNDO_RECORD), SIZE(24), CLASS(TRANSACTION), REDO_TYPE(UNDO), PROPAGATE_LOG(YES), RID(1,25,2109)
     0000090A67000000 BE06000000001700 02000000A60C0000                     ....g... ........ ........         
    [PIECE #6] : TYPE(HEAP_INSERT), SIZE(79), CLASS(TABLE), REDO_TYPE(PAGE), PROPAGATE_LOG(YES), RID(2,0,3238)
     3A0018003D080000 5503000000000000 1700000000000000 2D05000000000000    :...=... U....... ........ -.......
     0000000044010000 0000000000000002 1500000000000002 C13008C01DE297B7    ....D... ........ ........ .0......
     D7F202F98100001A 00000004000100                                        ........ .......                   
    
    [LOG #17] : BLOCK(3580), LSN(61540), SIZE(170), PIECE_COUNT(2), TRANS_ID(83D0018003A), TRANS_SEQ(753), RID(0,0,103)
    [PIECE #0] : TYPE(SEGMENT_GLOBAL_SCN), SIZE(34), CLASS(SEGMENT), REDO_TYPE(MULTIPLE_PAGE), PROPAGATE_LOG(NO), RID(0,-1,0)
     5503000000000000 1800000000000000 2D05000000000000 010000000000BE06    U....... ........ -....... ........
     0000                                                                   ..                                 
    [PIECE #1] : TYPE(COMMIT), TRANS_ID(83D0018003A), TIME(2020-10-12 14:26:07.269999), SIZE(104), CLASS(TRANSACTION), REDO_TYPE(TRANSACTION), PROPAGATE_LOG(YES), RID(1,24,2109)
     3A0018003D080000 00000000FFFFFFFF 0400ED0200000000 5503000000000000    :...=... ........ ........ U.......
     1800000000000000 2D05000000000000 5503000000000000 1700000000000000    ........ -....... U....... ........
     2D05000000000000 63F0000000000000 6F9CA05F73B10500 CF02000011000000    -....... c....... o.._s... ........
     0000000000000000                                                       ........                           
    

    共17行redo

  • G2N2 節(jié)點

[goldilocks@gs08 wal]$ gdump control control_1.ctl -s log

 Copyright ? 2010 SUNJESOFT Inc. All rights reserved.
 Release 20c 20.1.1 revision(31618)

===========================================================
FILE: control_1.ctl
TYPE: CONTROLFILE
TIME: 2020-10-14 17:20:34.265815
===========================================================

 [LOG SECTION]
-----------------------------------------------------------
   DATABASE CREATION TIME : 2020-09-27 08:51:16.608945

  [CHECKPOINT]
    LID                               :  15,3253,13
    LSN                               :  272885
    RESETLOG LSN                      :  -1
    ARCHIVELOG MODE                   :  ARCHIVELOG
    LAST INACTIVATED LOGFILE SEQUENCE :  14

  [LOG STREAM]
    STATE          :  ACTIVE
    GROUP COUNT    :  4
    BLOCK SIZE     :  512
    FILE SEQUENCE  :  15

    [LOG GROUP #0]
      STATE         : INACTIVE
      SIZE          : 104857600
      MEMBER COUNT  : 1
      FILE SEQUENCE : 12
      PREV LAST LSN : 260402
      MEMBER #0     : (ACTIVE) "/home/goldilocks/goldilocks_data/wal/redo_0_0.log"

    [LOG GROUP #1]
      STATE         : INACTIVE
      SIZE          : 104857600
      MEMBER COUNT  : 1
      FILE SEQUENCE : 13
      PREV LAST LSN : 266354
      MEMBER #0     : (ACTIVE) "/home/goldilocks/goldilocks_data/wal/redo_1_0.log"

    [LOG GROUP #2]
      STATE         : INACTIVE
      SIZE          : 104857600
      MEMBER COUNT  : 1
      FILE SEQUENCE : 14
      PREV LAST LSN : 266357
      MEMBER #0     : (ACTIVE) "/home/goldilocks/goldilocks_data/wal/redo_2_0.log"

    [LOG GROUP #3]
      STATE         : CURRENT
      SIZE          : 104857600
      MEMBER COUNT  : 1
      FILE SEQUENCE : 15
      PREV LAST LSN : 266617
      MEMBER #0     : (ACTIVE) "/home/goldilocks/goldilocks_data/wal/redo_3_0.log"

===========================================================
TIME: 2020-10-14 17:20:34.287068
===========================================================

redo_3_0.log為current日志,checkpoint位置是272885

[goldilocks@gs08 wal]$ gdump log redo_3_0.log -n 272885

 Release 20c 20.1.1 revision(31618)

===========================================================
FILE: redo_3_0.log
TYPE: LOGFILE PAGE
TIME: 2020-10-14 17:22:07.267428
===========================================================

===========================================================
 [LOG FILE HEADER]
-----------------------------------------------------------
 LOG_GROUP_ID    : 3
 BLOCK_SIZE      : 512
 FILE_SIZE       : 104857600
 FILE_SEQUENCE   : 15
 PREV_LAST_LSN   : 266617
 CREATION TIME   : 2020-10-11 22:07:12.247585
 SIGNATURE       : 8C9D5600005B11EB8554BF163FAF3B79
===========================================================

[LOG #0] : BLOCK(3253), LSN(272885), SIZE(64), PIECE_COUNT(1), TRANS_ID(FFFFFFFFFFFFFFFE), TRANS_SEQ(1537), RID(0,-1,0)
[PIECE #0] : TYPE(CHKPT_BEGIN), TIME(2020-10-12 14:25:53.411774), SIZE(48), CLASS(RECOVERY), REDO_TYPE(CONTROL_FILE), PROPAGATE_LOG(NO), RID(0,-1,0)
 F529040000000000 5503000000000000 1000000000000000 C605000000000000    .)...... U....... ........ ........
 F305000019000000 BE26CD5E73B10500                                      ........ .&.^s...                  

[LOG #1] : BLOCK(3253), LSN(272886), SIZE(16), PIECE_COUNT(1), TRANS_ID(FFFFFFFFFFFFFFFE), TRANS_SEQ(0), RID(0,-1,0)
[PIECE #0] : TYPE(CHKPT_END), SIZE(0), CLASS(RECOVERY), REDO_TYPE(CONTROL_FILE), PROPAGATE_LOG(NO), RID(0,-1,0)

[LOG #2] : BLOCK(3254), LSN(272887), SIZE(561), PIECE_COUNT(7), TRANS_ID(3A80031003A), TRANS_SEQ(1537), RID(0,0,1591)
[PIECE #0] : TYPE(INIT_PAGE), SIZE(120), CLASS(PAGE_ACCESS), REDO_TYPE(PAGE), PROPAGATE_LOG(YES), RID(2,0,2790)
 0D00030004000000 5729040000000000 7D4AEDF240B00500 02000400C30A0000    ........ W)...... }J..@... ........
 0000000037060000 0000000000000000 0000000000000000 0000000000000000    ....7... ........ ........ ........
 0000000000000000 0000000000000000 0000000000000000 0000000000000000    ........ ........ ........ ........
 0000000000000000 0000000000000000 02001000E60A0000                     ........ ........ ........         
[PIECE #1] : TYPE(BITMAP_UPDATE_LEAF_STATUS), SIZE(52), CLASS(SEGMENT), REDO_TYPE(PAGE), PROPAGATE_LOG(YES), RID(2,4,2755)
 0400030000000000 0000000000000000 0000000000000000 0000000000000000    ........ ........ ........ ........

... 


[LOG #41] : BLOCK(3293), LSN(272926), SIZE(170), PIECE_COUNT(2), TRANS_ID(3A8006A003A), TRANS_SEQ(1556), RID(0,0,103)
[PIECE #0] : TYPE(SEGMENT_GLOBAL_SCN), SIZE(34), CLASS(SEGMENT), REDO_TYPE(MULTIPLE_PAGE), PROPAGATE_LOG(NO), RID(0,-1,0)
 5503000000000000 2400000000000000 C605000000000000 0100000000003706    U....... $....... ........ ......7.
 0000                                                                   ..                                 
[PIECE #1] : TYPE(COMMIT), TRANS_ID(3A8006A003A), TIME(2020-10-12 14:26:36.482417), SIZE(104), CLASS(TRANSACTION), REDO_TYPE(TRANSACTION), PROPAGATE_LOG(YES), RID(1,106,936)
 3A006A00A8030000 00000000FFFFFFFF 0400050300000000 5503000000000000    :.j..... ........ ........ U.......
 2400000000000000 C605000000000000 5503000000000000 2300000000000000    $....... ........ U....... #.......
 C605000000000000 1D2A040000000000 715B5E6173B10500 0606000019000000    ........ .*...... q[^as... ........
 0000000000000000                                                       ........                           


G2N2 檢查點后有41行記錄,多于G2N1的記錄

接著我們驗證一下最后一條commit對應(yīng)的scn號

G2N1
[LOG #17] : BLOCK(3580), LSN(61540), SIZE(170), PIECE_COUNT(2), TRANS_ID(83D0018003A), TRANS_SEQ(753), RID(0,0,103)
[PIECE #0] : TYPE(SEGMENT_GLOBAL_SCN), SIZE(34), CLASS(SEGMENT), REDO_TYPE(MULTIPLE_PAGE), PROPAGATE_LOG(NO), RID(0,-1,0)
 5503000000000000 1800000000000000 2D05000000000000 010000000000BE06    U....... ........ -....... ........
 0000                                                                   ..                                 
[PIECE #1] : TYPE(COMMIT), TRANS_ID(83D0018003A), TIME(2020-10-12 14:26:07.269999), SIZE(104), CLASS(TRANSACTION), REDO_TYPE(TRANSACTION), PROPAGATE_LOG(YES), RID(1,24,2109)
 3A0018003D080000 00000000FFFFFFFF 0400ED0200000000 5503000000000000    :...=... ........ ........ U.......
 1800000000000000 2D05000000000000 5503000000000000 1700000000000000    ........ -....... U....... ........
 2D05000000000000 63F0000000000000 6F9CA05F73B10500 CF02000011000000    -....... c....... o.._s... ........
 0000000000000000  
 

 G2N2
 [LOG #41] : BLOCK(3293), LSN(272926), SIZE(170), PIECE_COUNT(2), TRANS_ID(3A8006A003A), TRANS_SEQ(1556), RID(0,0,103)
[PIECE #0] : TYPE(SEGMENT_GLOBAL_SCN), SIZE(34), CLASS(SEGMENT), REDO_TYPE(MULTIPLE_PAGE), PROPAGATE_LOG(NO), RID(0,-1,0)
 5503000000000000 2400000000000000 C605000000000000 0100000000003706    U....... $....... ........ ......7.
 0000                                                                   ..                                 
[PIECE #1] : TYPE(COMMIT), TRANS_ID(3A8006A003A), TIME(2020-10-12 14:26:36.482417), SIZE(104), CLASS(TRANSACTION), REDO_TYPE(TRANSACTION), PROPAGATE_LOG(YES), RID(1,106,936)
 3A006A00A8030000 00000000FFFFFFFF 0400050300000000 5503000000000000    :.j..... ........ ........ U.......
 2400000000000000 C605000000000000 5503000000000000 2300000000000000    $....... ........ U....... #.......
 C605000000000000 1D2A040000000000 715B5E6173B10500 0606000019000000    ........ .*...... q[^as... ........
 0000000000000000    
 
 commit redo record size 是104 ,第49-56 byte記錄的是gcn(5503000000000000),后面每8個記錄dcn(1700000000000000)和lcn(2D05000000000000)
由于x86架構(gòu)cpu是內(nèi)存排序是低位排在前面,5503 轉(zhuǎn)換成10進制時應(yīng)該0355對應(yīng)的十進制為853,所以g2n1的gcn是853,dcn是23。g2n2的gcn
同樣是853,但dcn是35,比g2n1多12個組內(nèi)事務(wù)。也就是g2n2比g2n1多數(shù)據(jù)。
 

可以看出直接啟動集群后,G2N1節(jié)點 沒有順利加入到集群是因為group 2 組內(nèi)數(shù)據(jù)不一致,需要想辦法達(dá)成一致才可以。

有兩個思路,一、G2N2把比G2N1多的數(shù)據(jù)丟掉 ,二、G2N1補上缺少數(shù)據(jù)的數(shù)據(jù)。正常情況下,補全數(shù)據(jù)會更好些,畢竟數(shù)據(jù)是有價值的。本次實踐的話,兩種方案都嘗試一下

丟棄數(shù)據(jù)恢復(fù)方法

將每個節(jié)點單獨啟動到local open狀態(tài),檢查所有節(jié)點的gcn是否相同,組內(nèi)的dcn是否相同。

有過前面的操作知道group1 組內(nèi)的數(shù)據(jù)是一致的,group 2 組內(nèi)數(shù)據(jù)需要做到一致,并且G2N2的數(shù)據(jù)比G2N1多,這樣的話,將G2N2做不完全恢復(fù)到G2N1最后的redo 記錄即可。

查找G2N2需要恢復(fù)到的lsn號,即和G2N1最后一個commit相同的gcn+dcn組合(5503000000000000 1700000000000000)

[goldilocks@gs08 wal]$ gdump log redo_3_0.log -n 272885|grep -B  5 "5503000000000000 1700000000000000"

...

[LOG #17] : BLOCK(3269), LSN(272902), SIZE(170), PIECE_COUNT(2), TRANS_ID(3A80046003A), TRANS_SEQ(1544), RID(0,0,103)
[PIECE #0] : TYPE(SEGMENT_GLOBAL_SCN), SIZE(34), CLASS(SEGMENT), REDO_TYPE(MULTIPLE_PAGE), PROPAGATE_LOG(NO), RID(0,-1,0)
 5503000000000000 1800000000000000 C605000000000000 0100000000003706    U....... ........ ........ ......7.
 0000                                                                   ..                                 
[PIECE #1] : TYPE(COMMIT), TRANS_ID(3A80046003A), TIME(2020-10-12 14:26:07.272916), SIZE(104), CLASS(TRANSACTION), REDO_TYPE(TRANSACTION), PROPAGATE_LOG(YES), RID(1,70,936)
 3A004600A8030000 00000000FFFFFFFF 0400ED0200000000 5503000000000000    :.F..... ........ ........ U.......
 1800000000000000 C605000000000000 5503000000000000 1700000000000000    ........ ........ U....... ........

查找出來后這一條是符合條件的,對應(yīng)的LSN是272902,所以G2N2需要恢復(fù)到lsn 272902
[goldilocks@gs08 wal]$ gsql

 Copyright ? 2010 SUNJESOFT Inc. All rights reserved.
 Release 20c 20.1.1 revision(31618)


Connected to an idle instance.

gSQL> startup mount

Startup success

gSQL> alter database recover until change 272902;

Database altered.

gSQL> alter system open local database  resetlogs;

System altered.

gSQL> select statement_view_scn() from dual;

STATEMENT_VIEW_SCN()
--------------------
853.24.1492         

1 row selected.

前面查到G2N1的scn是853.24.1339,此時group2 組的兩個節(jié)點數(shù)據(jù)已經(jīng)一致了。

此時4個節(jié)點是local open狀態(tài),且所有節(jié)點gcn是一致的,組內(nèi)dcn也是一致的,open global database 應(yīng)該會成功。

[goldilocks@gs05 ~]$ sh select.sh "select status from x\$instance@local;"|grep -v "^$"
192.168.149.131
STATUS    
----------
LOCAL OPEN
1 row selected.
192.168.149.132
STATUS    
----------
LOCAL OPEN
1 row selected.
192.168.149.133
STATUS    
----------
LOCAL OPEN
1 row selected.
192.168.149.134
STATUS    
----------
LOCAL OPEN
1 row selected.
[goldilocks@gs05 ~]$ sh select.sh "select statement_view_scn() from dual;"|grep -v "^$"
192.168.149.131
STATEMENT_VIEW_SCN()
--------------------
853.26.488          
1 row selected.
192.168.149.132
STATEMENT_VIEW_SCN()
--------------------
853.26.1066         
1 row selected.
192.168.149.133
STATEMENT_VIEW_SCN()
--------------------
853.24.1339         
1 row selected.
192.168.149.134
STATEMENT_VIEW_SCN()
--------------------
853.24.1492         
1 row selected.

g1n1 上執(zhí)行

gSQL> alter system open global database;

System altered.

gSQL> select LOCAL_MEMBER_NAME,STATUS from x$instance;

LOCAL_MEMBER_NAME STATUS
----------------- ------
G1N1              OPEN  
G2N2              OPEN  
G2N1              OPEN  
G1N2              OPEN  

4 rows selected.

gSQL> select count(*) from t1;

COUNT(*)
--------
      50

1 row selected.

此時數(shù)據(jù)庫集群所有節(jié)點都從備份中恢復(fù)到正常狀態(tài)。

補全數(shù)據(jù)恢復(fù)方法

理論上將G2N2 當(dāng)前的redo 日志文件多于G2N1部分追加到G2N1的redo 日志中即可完成兩個節(jié)點數(shù)據(jù)一致性,但這個思路操作起來比較麻煩,我們采用另一種方式,重平衡數(shù)據(jù)的方式(rebalance database)

恢復(fù)環(huán)境到之前狀態(tài)

先關(guān)閉數(shù)據(jù)庫集群,然后清空當(dāng)前數(shù)據(jù)文件,復(fù)制備份數(shù)據(jù)文件到數(shù)據(jù)目錄
gSQL> \cshutdown
Shutdown success

run_cmd.sh "rm -r ~/goldilocks_data/*"
run_cmd.sh "cp -r ~/backup/*  ~/goldilocks_data/"

嘗試一鍵啟動

gSQL> \cstartup

ERR-42000(16403): of the total '4' nodes, '1' nodes failed to join the global database
Startup success

發(fā)現(xiàn)有節(jié)點沒有加入到集群

gSQL> select a.member_name,a.member_id ,b.LOGICAL_CONNECTION,b.PHYSICAL_CONNECTION ,b.LOCAL_SCN from cluster_member@local a, x$cluster_member@local b where a.MEMBER_ID=b.MEMBER_ID order by a.GROUP_ID, 1;

MEMBER_NAME MEMBER_ID LOGICAL_CONNECTION PHYSICAL_CONNECTION LOCAL_SCN 
----------- --------- ------------------ ------------------- ----------
G1N1                1 ACTIVE             ACTIVE              853.26.489
G1N2                6 ACTIVE             ACTIVE              853.26.0  
G2N1                5 INACTIVE           INACTIVE            -1.-1.-1  
G2N2                3 ACTIVE             ACTIVE              853.36.0  

4 rows selected.

G2N1 沒有加入到集群,嘗試手動將G2N1加入到集群

[goldilocks@gs07 ~]$ gsql

 Copyright ? 2010 SUNJESOFT Inc. All rights reserved.
 Release 20c 20.1.1 revision(31618)


Connected to GOLDILOCKS Database.

gSQL> select * from x$instance@local;

VERSION                            STARTUP_TIME               STATUS     OS_USER_ID IS_CLUSTER LOCAL_GROUP_ID LOCAL_MEMBER_ID LOCAL_MEMBER_NAME LOCAL_MEMBER_POSITION
---------------------------------- -------------------------- ---------- ---------- ---------- -------------- --------------- ----------------- ---------------------
Release 20c 20.1.1 revision(31618) 2020-10-15 21:25:14.099212 LOCAL OPEN       1007 TRUE                    2               5 G2N1                                  1

1 row selected.

gSQL> alter system join database;

ERR-42000(16405): of the total '6' tables in the database, '3' tables need to be rebalanced
System altered.

gSQL> alter database rebalance;

Database altered.

gSQL> select * from x$instance;

VERSION                            STARTUP_TIME               STATUS OS_USER_ID IS_CLUSTER LOCAL_GROUP_ID LOCAL_MEMBER_ID LOCAL_MEMBER_NAME LOCAL_MEMBER_POSITION
---------------------------------- -------------------------- ------ ---------- ---------- -------------- --------------- ----------------- ---------------------
Release 20c 20.1.1 revision(31618) 2020-10-15 21:25:14.099212 OPEN         1007 TRUE                    2               5 G2N1                                  1
Release 20c 20.1.1 revision(31618) 2020-10-15 21:25:14.100691 OPEN         1007 TRUE                    2               3 G2N2                                  2
Release 20c 20.1.1 revision(31618) 2020-10-15 21:25:14.099467 OPEN         1005 TRUE                    1               6 G1N2                                  3
Release 20c 20.1.1 revision(31618) 2020-10-15 21:25:13.453708 OPEN         1008 TRUE                    1               1 G1N1                                  0

4 rows selected.

gSQL> select count(*) from t1;

COUNT(*)
--------
      62

1 row selected.

到此集群所有節(jié)點狀態(tài)恢復(fù)正常,且恢復(fù)出的數(shù)據(jù)多于上一種恢復(fù)方法。

小結(jié)

兩種方法比較起來看,第二種可以恢復(fù)更多的數(shù)據(jù),而且恢復(fù)操作更簡易些。

增量備份(級別備份)

除了支持全量備份外,goldilocks還支持增量備份,0級備份到4級備份

和backup有關(guān)的一些參數(shù)

gSQL> select property_name,property_value,init_value,is_deprecated from v$PROPERTY where PROPERTY_NAME like '%BACKUP%';

PROPERTY_NAME                        PROPERTY_VALUE                          INIT_VALUE                              IS_DEPRECATED
------------------------------------ --------------------------------------- --------------------------------------- -------------
BACKUP_DIR                           /home/goldilocks/goldilocks_data/backup /home/goldilocks/goldilocks_data/backup TRUE         
DEFAULT_REMOVAL_OBSOLETE_BACKUP_LIST NO                                      NO                                      FALSE        
DEFAULT_REMOVAL_BACKUP_FILE          NO                                      NO                                      FALSE        
READABLE_BACKUP_DIR_COUNT            1                                       1                                       FALSE        
BACKUP_DIR_1                         /home/goldilocks/goldilocks_data/backup /home/goldilocks/goldilocks_data/backup FALSE        
BACKUP_DIR_2                         /home/goldilocks/goldilocks_data/backup /home/goldilocks/goldilocks_data/backup FALSE        
BACKUP_DIR_3                         /home/goldilocks/goldilocks_data/backup /home/goldilocks/goldilocks_data/backup FALSE        
BACKUP_DIR_4                         /home/goldilocks/goldilocks_data/backup /home/goldilocks/goldilocks_data/backup FALSE        
BACKUP_DIR_5                         /home/goldilocks/goldilocks_data/backup /home/goldilocks/goldilocks_data/backup FALSE        
BACKUP_DIR_6                         /home/goldilocks/goldilocks_data/backup /home/goldilocks/goldilocks_data/backup FALSE        
BACKUP_DIR_7                         /home/goldilocks/goldilocks_data/backup /home/goldilocks/goldilocks_data/backup FALSE        
BACKUP_DIR_8                         /home/goldilocks/goldilocks_data/backup /home/goldilocks/goldilocks_data/backup FALSE        
BACKUP_DIR_9                         /home/goldilocks/goldilocks_data/backup /home/goldilocks/goldilocks_data/backup FALSE        
BACKUP_DIR_10                        /home/goldilocks/goldilocks_data/backup /home/goldilocks/goldilocks_data/backup FALSE        
INCREMENTAL_BACKUP_SCAN_BUFFER_SIZE  32                                      32                                      FALSE   

設(shè)置增量備份存放的路徑,BACKUP_DIR_1-BACKUP_DIR_10,(BACKUP_DIR 是廢棄參數(shù))。其中BACKUP_DIR_1不可以在session級別設(shè)置,system 級別設(shè)置需要重啟數(shù)據(jù)庫生效。默認(rèn)的BACKUP_DIR_1在${GOLDILOCKS_DATA}/backup 路徑下,正式系統(tǒng)通常需要單獨設(shè)置備份路徑。

alter system set BACKUP_DIR_1 ='/home/goldilocks/increment_backup' scope=file;
cshutdown;cstartup

gSQL> select property_name,property_value,init_value,is_deprecated from v$PROPERTY where PROPERTY_NAME = 'BACKUP_DIR_1';

PROPERTY_NAME PROPERTY_VALUE                    INIT_VALUE                        IS_DEPRECATED
------------- --------------------------------- --------------------------------- -------------
BACKUP_DIR_1  /home/goldilocks/increment_backup /home/goldilocks/increment_backup FALSE        

0級備份

最基礎(chǔ)的級別備份,其他級別的備份依賴最基礎(chǔ)的0級備份

gSQL> alter database backup incremental level 0 ;

Database altered.

gSQL> select * from v$INCREMENTAL_BACKUP;

BACKUP_NAME                      BACKUP_SCOPE INCREMENTAL_LEVEL INCREMENTAL_TYPE    LSN BEGIN_TIME                 COMPLETION_TIME           
-------------------------------- ------------ ----------------- ---------------- ------ -------------------------- --------------------------
databaseD20201016T110117L0S2.inc database                     0 N/A              263709 2020-10-16 11:01:17.014740 2020-10-16 11:01:20.208249
controlD20201016T110120L0S2.inc  control                      0 N/A              263709 2020-10-16 11:01:20.223898 2020-10-16 11:01:20.226171
2 rows selected.

查看一下具體生成的備份文件

[goldilocks@gs05 ~]$ run_cmd.sh "ls ~/increment_backup"
gs05
controlD20201016T110120L0S2.inc
databaseD20201016T110117L0S2.inc
----------------------------
gs06
controlD20201016T110118L0S1.inc
databaseD20201016T110116L0S1.inc
----------------------------
gs07
controlD20201016T110117L0S2.inc
databaseD20201016T110116L0S2.inc
----------------------------
gs08
controlD20201016T110121L0S2.inc
databaseD20201016T110116L0S2.inc
----------------------------

gdump工具查看一下備份文件的內(nèi)容

[goldilocks@gs05 increment_backup]$ gdump backup databaseD20201016T110117L0S2.inc 

 Copyright ? 2010 SUNJESOFT Inc. All rights reserved.
 Release 20c 20.1.1 revision(31618)

===========================================================
FILE: databaseD20201016T110117L0S2.inc
TYPE: INCREMENTAL BACKUP
TIME: 2020-10-16 11:17:13.673234
===========================================================

 INCREMENTAL BACKUP FILE HEADER 
----------------------------------------------------------------------------
  Backup object                                      : DATABASE
  Tablespace count                                   : 6
  Backup body size                                   : 185942016
  Last checkpoint lsn of previous incremental backup : 0
  Max page lsn of incremental backup                 : 263709
  Last checkpoint lsn of incremental backup          : 263709
  Last checkpoint lid of incremental backup          : (15, 3710, 13)
  Database signature                                 : 691AA23C005B11EB89DBCBC21F1732DE
----------------------------------------------------------------------------

 INCREMENTAL BACKUP FILE TAIL
----------------------------------------------------------------------------
  Tablespace id(0) : backup page count(13022), start offset(8192)
  Tablespace id(1) : backup page count(2122), start offset(106684416)
  Tablespace id(2) : backup page count(2501), start offset(124067840)
  Tablespace id(4) : backup page count(17), start offset(144556032)
  Tablespace id(5) : backup page count(389), start offset(144695296)
  Tablespace id(6) : backup page count(4647), start offset(147881984)

===========================================================
TIME: 2020-10-16 11:17:13.673502

可以看到,備份文件里包含表空間的一些信息和checkpoint的信息,沒看到commit.log ,location file的信息。

恢復(fù)

場景一:單個數(shù)據(jù)文件損壞

模擬故障

單個節(jié)點的數(shù)據(jù)文件損壞,不影響集群的整體使用,使用備份將單個節(jié)點恢復(fù)即可。

挑一個節(jié)點模擬故障,刪除某一個數(shù)據(jù)文件。然后模擬用戶繼續(xù)使用數(shù)據(jù)庫

gSQL> create table t2 (id int,name varchar(100)) sharding by hash (id ) shard count 2;

Table created.

gSQL> insert into t2 values (1,'after rm data file'),(2,'after rm data file');

1 row created.

gSQL> commit;

Commit complete.

數(shù)據(jù)庫集群是可以正常使用的,刪除數(shù)據(jù)文件的節(jié)點也沒有報錯,因為數(shù)據(jù)都存在內(nèi)存中,沒有落盤的動作,數(shù)據(jù)庫發(fā)現(xiàn)不了問題,接著我們手動觸發(fā)檢查點,出現(xiàn)報錯,(其他正常的節(jié)點checkpoint沒有問題)

gSQL> alter system checkpoint;

ERR-HY000(14106): tablespace (MEM_DATA_TBS) is taken offline as the result of a write error
ERR-HY000(11040): No such object (/home/goldilocks/goldilocks_data/db/system_data.dbf) : stfOpen() returned errno(2)
System altered.

gSQL> select IS_ONLINE from v$tablespace where TBS_NAME='MEM_DATA_TBS';

IS_ONLINE
---------
FALSE    

gSQL> select count(*) from t1@local;

ERR-42000(14041): cannot access the OFFLINE tablespace 'MEM_DATA_TBS'

表空間已經(jīng)處于offline狀態(tài),查詢該節(jié)點數(shù)據(jù)出現(xiàn)錯誤。

恢復(fù)

啟動問題節(jié)點到mount狀態(tài)

gSQL> shutdown 

Shutdown success

gSQL> startup mount

Startup success

然后正常節(jié)點模擬繼續(xù)使用數(shù)據(jù)庫

gSQL> insert into t2 values (3,'during g2n2 recovery'),(4,'during g2n2 recovery');

1 row created.

gSQL> commit;

Commit complete.

單個數(shù)據(jù)文件損壞,還原數(shù)據(jù)文件對應(yīng)的表空間即可

g2n2> alter database restore tablespace mem_data_tbs;

Database altered.
2020-10-19 09:45:27.604274 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[RESTORE] begin (TABLESPACE, -1)

[2020-10-19 09:45:27.612729 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
    [RESTORE] recreate datafile ( /home/goldilocks/goldilocks_data/db/system_data.dbf )

[2020-10-19 09:45:27.612763 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[DATABASE FILE MANAGER] remove database file (/home/goldilocks/goldilocks_data/db/system_data.dbf) - success

[2020-10-19 09:45:27.612773 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[DATABASE FILE MANAGER] register database file (/home/goldilocks/goldilocks_data/db/system_data.dbf) - success

[2020-10-19 09:45:27.728515 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
    [RESTORE] recreate end

[2020-10-19 09:45:27.733664 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[STARTUP-SM] LOAD DATAFILES

[2020-10-19 09:45:27.733726 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
.... datafile '/home/goldilocks/goldilocks_data/db/system_dict.dbf' assigned to PARALLEL_IO_GROUP_1 

[2020-10-19 09:45:27.733741 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
.... datafile '/home/goldilocks/goldilocks_data/db/system_undo.dbf' assigned to PARALLEL_IO_GROUP_1 

[2020-10-19 09:45:27.734024 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
.... datafile '/home/goldilocks/goldilocks_data/db/system_aux.dbf' assigned to PARALLEL_IO_GROUP_1 

[2020-10-19 09:45:27.734269 INSTANCE(GOLDILOCKS) THREAD(19228,139887097394944)] [INFORMATION]
.... LOAD DATAFILE(/home/goldilocks/goldilocks_data/db/system_dict.dbf)

[2020-10-19 09:45:28.275547 INSTANCE(GOLDILOCKS) THREAD(19228,139887097394944)] [INFORMATION]
.... LOAD COMPLETED - DATAFILE(/home/goldilocks/goldilocks_data/db/system_dict.dbf)

[2020-10-19 09:45:28.275651 INSTANCE(GOLDILOCKS) THREAD(19228,139887097394944)] [INFORMATION]
.... LOAD DATAFILE(/home/goldilocks/goldilocks_data/db/system_undo.dbf)

[2020-10-19 09:45:28.312210 INSTANCE(GOLDILOCKS) THREAD(19228,139887097394944)] [INFORMATION]
.... LOAD COMPLETED - DATAFILE(/home/goldilocks/goldilocks_data/db/system_undo.dbf)

[2020-10-19 09:45:28.312297 INSTANCE(GOLDILOCKS) THREAD(19228,139887097394944)] [INFORMATION]
.... LOAD DATAFILE(/home/goldilocks/goldilocks_data/db/system_aux.dbf)

[2020-10-19 09:45:28.520626 INSTANCE(GOLDILOCKS) THREAD(19228,139887097394944)] [INFORMATION]
.... LOAD COMPLETED - DATAFILE(/home/goldilocks/goldilocks_data/db/system_aux.dbf)

[2020-10-19 09:45:28.522207 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[STARTUP-SM] REFINE TABLESPACE AND DATAFILE

[2020-10-19 09:45:28.637237 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
.... datafile '/home/goldilocks/goldilocks_data/db/system_data.dbf' assigned to PARALLEL_IO_GROUP_1 

[2020-10-19 09:45:28.637435 INSTANCE(GOLDILOCKS) THREAD(19228,139887097394944)] [INFORMATION]
.... LOAD DATAFILE(/home/goldilocks/goldilocks_data/db/system_data.dbf)

[2020-10-19 09:45:29.103868 INSTANCE(GOLDILOCKS) THREAD(19228,139887097394944)] [INFORMATION]
.... LOAD COMPLETED - DATAFILE(/home/goldilocks/goldilocks_data/db/system_data.dbf)

[2020-10-19 09:45:29.131161 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
    [RESTORE] merge datafile begin ( backup file - /home/goldilocks/increment_backup/databaseD20201016T234607L0S2.inc )

[2020-10-19 09:45:29.133506 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
    [RESTORE] merge datafile end

[2020-10-19 09:45:29.177209 INSTANCE(GOLDILOCKS) THREAD(19228,139887116277504)] [INFORMATION]
[PAGE FLUSHER] FLUSHED_LSN(-1)]

[2020-10-19 09:45:29.178274 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[RESTORE] end

[2020-10-19 09:45:29.180503 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[EVENT] alter database restore : SUCCESS

resotre tablespace 應(yīng)該就是從level 0 備份中拿取對應(yīng)的備份文件,restore 到備份時的狀態(tài)。

  控制文件中system_data.dbf 對應(yīng)的checkpoint是37384
 
  NAME          :  MEM_DATA_TBS
   ATTRIBUTES    :  MEMORY | PERSISTENT | DATA
   STATE         :  CREATED
   LOGGING STATE :  LOGGING
   ONLINE STATE  :  OFFLINE
   EXTENT_SIZE   :  32
   RELATION_ID   :  87467008983040

   [DATAFILE #0]
     SIZE      :  209715200
     AUTOEXTEN :  OFF
     NEXT      :  0
     MAXSIZE   :  209715200
     STATE     :  CREATED
     NAME      :  "/home/goldilocks/goldilocks_data/db/system_data.dbf"
     CHKPT LSN : 37384
     CHKPT LID : (1, 1960, 13)
查看resotre 后system_data.dbf對應(yīng)的checkpoint 36395

[goldilocks@gs08 db]$ gdump data system_data.dbf -h

 Copyright ? 2010 SUNJESOFT Inc. All rights reserved.
 Release 20c 20.1.1 revision(31618)

===========================================================
FILE: system_data.dbf
TYPE: DATAFILE HEADER
TIME: 2020-10-19 09:59:21.891048
===========================================================
  FILE                   : system_data.dbf
  Tablespace Physical Id : 2
  Datafile Id            : 0
  Last Checkpoint Lsn    : 36395
  Last Checkpoint Lid    : (0, 59107, 13)
  Creation TIME          : 2020-10-16 23:25:36.324841
  Database signature     : D5A450520FC311EBA5B049B54ADBD1B3

===========================================================
TIME: 2020-10-19 09:59:21.891889
===========================================================

restore 后需要recover ,recover 需要從36395開始恢復(fù)到最新的redo

g2n2> alter database recover tablespace mem_data_tbs;

Database altered.
[2020-10-19 10:08:07.879322 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[STARTUP-SM] LOAD DATAFILES

[2020-10-19 10:08:07.880219 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[STARTUP-SM] REFINE TABLESPACE AND DATAFILE

[2020-10-19 10:08:07.905576 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
.... datafile '/home/goldilocks/goldilocks_data/db/system_data.dbf' assigned to PARALLEL_IO_GROUP_1 

[2020-10-19 10:08:07.905694 INSTANCE(GOLDILOCKS) THREAD(19228,139887097394944)] [INFORMATION]
.... LOAD DATAFILE(/home/goldilocks/goldilocks_data/db/system_data.dbf)

[2020-10-19 10:08:08.012629 INSTANCE(GOLDILOCKS) THREAD(19228,139887097394944)] [INFORMATION]
.... LOAD COMPLETED - DATAFILE(/home/goldilocks/goldilocks_data/db/system_data.dbf)

[2020-10-19 10:08:08.058296 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[RECOVERY MANAGER] recover TABLESPACE - begin (COMPLETE, LSN 0)

[2020-10-19 10:08:08.058600 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
    [RECOVERY MANAGER] analysis begin

[2020-10-19 10:08:08.164847 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
    [RECOVERY MANAGER] read checkpoint log - checkpoint lid(0.59107.13), oldest lsn(36395), local scn(903.0.1046), time(2020-10-16 23:46:07.893176), transaction sequence(532), Grid sequence(12884902414)

[2020-10-19 10:08:08.191845 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
    [RECOVERY MANAGER] analysis done

[2020-10-19 10:08:08.191909 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
    [RECOVERY MANAGER] ready to redo - start lid(0.59107.0), start lsn(36395)

[2020-10-19 10:08:08.199126 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
    [RECOVERY MANAGER] redo has performing - logfile(/home/goldilocks/goldilocks_data/archive_log/archive_0.log), lsn(36395)

[2020-10-19 10:08:08.205028 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
    [RECOVERY MANAGER] redo has performing - logfile(/home/goldilocks/goldilocks_data/wal/redo_1_0.log), lsn(36397)

[2020-10-19 10:08:08.209624 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[RECOVERY MANAGER] recover TABLESPACE - end

[2020-10-19 10:08:08.252564 INSTANCE(GOLDILOCKS) THREAD(19228,139887116277504)] [INFORMATION]
[PAGE FLUSHER] FLUSHED_LSN(-1)]

[2020-10-19 10:08:08.260469 INSTANCE(GOLDILOCKS) THREAD(19228,139887742981952)] [INFORMATION]
[EVENT] alter database recover : SUCCESS

recoer 后開啟數(shù)據(jù)庫,將表空間修改為online

g2n2> alter system open local database;

System altered.

g2n2> select IS_ONLINE from v$tablespace where TBS_NAME='MEM_DATA_TBS';

IS_ONLINE
---------
FALSE    

1 row selected.

g2n2> alter tablespace MEM_DATA_TBS online;

Tablespace altered.

g2n2> select IS_ONLINE from v$tablespace where TBS_NAME='MEM_DATA_TBS';

IS_ONLINE
---------
TRUE     

1 row selected.

g2n2> select count(*) from t1@local;

COUNT(*)
--------
      24

1 row selected.

g2n2> select * from t2@local;

ID NAME              
-- ------------------
 1 after rm data file

1 row selected.

可以發(fā)現(xiàn),t1的數(shù)據(jù)已經(jīng)恢復(fù)出來了,t2 的數(shù)據(jù)有一條。為什么有一條呢,因為數(shù)據(jù)文件損壞,但redo日志已經(jīng)記錄了數(shù)據(jù)庫操作,前面建t2表時指定shard count 2,是配合后面同時插入兩條測試數(shù)據(jù)時一定有一條數(shù)據(jù)落入到出故障的節(jié)點。

為什么只有一條數(shù)據(jù)呢,因為還沒有加入到集群中,恢復(fù)期間集群的數(shù)據(jù)還沒有同步過來。

此時g2n2還沒加入到集群中

g2n2> select statement_view_scn() from dual;

STATEMENT_VIEW_SCN()
--------------------
923.0.1101 

g1n1> select statement_view_scn() from dual;

STATEMENT_VIEW_SCN()
--------------------
925.1.590  

g2n2 已經(jīng)比集群落后了2個全局事務(wù)

將g2n2 加入到集群

g2n2> alter system join database;

ERR-42000(16405): of the total '11' tables in the database, '1' tables need to be rebalanced
System altered.

g2n2> alter database rebalance;

Database altered.

g2n2> select * from t2@local order by 1;

ID NAME                
-- --------------------
 1 after rm data file  
 3 during g2n2 recovery

2 rows selected.

g2n2> select * from t2 order by 1;

ID NAME                
-- --------------------
 1 after rm data file  
 2 after rm data file  
 3 during g2n2 recovery
 4 during g2n2 recovery

g2n2> select statement_view_scn() from dual;

STATEMENT_VIEW_SCN()
--------------------
934.0.1101          

1 row selected.

集群其他節(jié)點的scn
g1n1> select statement_view_scn() from dual;

STATEMENT_VIEW_SCN()
--------------------
934.0.590           

1 row selected.

加入集群后,新的數(shù)據(jù)通過rebalance 同步過來了。

場景:控制文件損壞

控制文件默認(rèn)是兩個副本,存放在相同的路徑下$GOLDILOCK_DATA/wal,放在相同路徑下冗余效果并不好,可以設(shè)置放在不同的路徑下

 alter system set  CONTROL_FILE_1 ='/home/goldilocks/goldilocks_data/backup/control_1.ctl' scope =file;

關(guān)閉數(shù)據(jù)庫,復(fù)制文件,然后重啟數(shù)據(jù)庫即可

 run_cmd.sh "cp ~/goldilocks_data/wal/control_1.ctl  ~/goldilocks_data/backup/control_1.ctl"
g1n1>  select PROPERTY_VALUE from v$PROPERTY where  property_name  like '%CONTROL_FILE_1%';

PROPERTY_VALUE                                       
-----------------------------------------------------
/home/goldilocks/goldilocks_data/backup/control_1.ctl

1 row selected.

模擬控制文件丟失

rm /home/goldilocks/goldilocks_data/backup/control_1.ctl

隨機
g1n1> alter system checkpoint;

System altered.

缺少控制文件的節(jié)點直接宕了

[2020-10-19 14:51:34.611741 INSTANCE(GOLDILOCKS) THREAD(23155,139815192352512)] [INFORMATION]
[CHECKPOINT] begin

[2020-10-19 14:51:34.612304 INSTANCE(GOLDILOCKS) THREAD(23155,139814663874304)] [INFORMATION]
[IO SLAVE] flush datafile ( tablespace : 0, datafile : 0 )

[2020-10-19 14:51:34.640728 INSTANCE(GOLDILOCKS) THREAD(23155,139814663874304)] [INFORMATION]
[IO SLAVE] flush datafile ( tablespace : 1, datafile : 0 )

[2020-10-19 14:51:34.640876 INSTANCE(GOLDILOCKS) THREAD(23155,139814663874304)] [INFORMATION]
[IO SLAVE] flush datafile ( tablespace : 2, datafile : 0 )

[2020-10-19 14:51:34.641185 INSTANCE(GOLDILOCKS) THREAD(23155,139814663874304)] [INFORMATION]
[IO SLAVE] flush datafile ( tablespace : 5, datafile : 0 )

[2020-10-19 14:51:34.641577 INSTANCE(GOLDILOCKS) THREAD(23155,139814663874304)] [INFORMATION]
[IO SLAVE] flush datafile ( tablespace : 6, datafile : 0 )

[2020-10-19 14:51:34.646512 INSTANCE(GOLDILOCKS) THREAD(23155,139814644991744)] [INFORMATION]
[CHECKPOINT] flush buffer checkpoint list[0] - for checkpoint (1), system min flushed lsn(41666), min flushed lsn (41709), flushed page count(0)

[2020-10-19 14:51:34.656874 INSTANCE(GOLDILOCKS) THREAD(23155,139815154587392)] [INFORMATION]
[PAGE FLUSHER] flushed lsn(41709), flushed page count(1024)]

[2020-10-19 14:51:34.656972 INSTANCE(GOLDILOCKS) THREAD(23155,139814317840128)] [INFORMATION]
[ARCHIVING] stable lsn(41709)

[2020-10-19 14:51:34.667498 INSTANCE(GOLDILOCKS) THREAD(23155,139814317840128)] [FATAL]
[SYSTEM FATAL] CAUSE("can't save control file")(GOLDILOCKS) ERR-HY000(11040): No such object (/home/goldilocks/goldilocks_data/backup/control_1.ctl): stfOpen() returned errno(2)

========================================================
(GOLDILOCKS) CALL STACK [Release 20c 20.1.1 revision(31618)]
========================================================
gmaster(kngAddErrorCallStackUnsafe+0xa3) [0xb60353]
gmaster(knlLogCallStackUnsafe+0x2a) [0xb3a13a]
gmaster(knlSystemFatal+0x35) [0xb475c5]
gmaster(smfSaveCtrlFile+0x76) [0xa6e066]
gmaster(smrArchivelogEventHandler+0x38) [0x9c5448]
gmaster(knlExecuteEnvEvent+0xb0) [0xb46740]
gmaster(ztmtArchivelogThread+0x320) [0x520700]
/lib64/libpthread.so.0(+0x7dd5) [0x7f2950d02dd5]
/lib64/libc.so.6(clone+0x6d) [0x7f295082402d]
(GOLDILOCKS) ERR-HY000(11040): No such object (/home/goldilocks/goldilocks_data/backup/control_1.ctl): stfOpen() returned errno(2)

復(fù)制正常的控制文件到丟失的位置,重啟數(shù)據(jù)庫,然后加入到集群即可恢復(fù)。

如果兩份控制文件都損壞,出現(xiàn)這種情況一般不是軟件造成的了。人為誤刪除整個目錄或者服務(wù)器文件系統(tǒng)損壞的可能性更大。這樣的話,需要整個節(jié)點的恢復(fù)了,參考單個節(jié)點數(shù)據(jù)全部丟失情況

場景二:單個節(jié)點數(shù)據(jù)全部丟失

誤刪除整個數(shù)據(jù)庫目錄,服務(wù)器文件系統(tǒng)損壞,多塊磁盤故障等會導(dǎo)致所有的數(shù)據(jù)丟失。

有兩個思路

使用0級備份恢復(fù)

適用場景:備份比較新,恢復(fù)的代價小于重新加入集群的代價,且歸檔一直沒有丟失。

模擬故障

刪除掉整個數(shù)據(jù)文件夾

rm -rf ~/goldilocks_data 
g1n1> insert into t2 values (5,'after rm all g1n2 data file'),(6,'after rm all g1n2 data file');

2 rows created.

g1n1> commit;

Commit complete.
恢復(fù)
[goldilocks@gs06 ~]$ gsql

 Copyright ? 2010 SUNJESOFT Inc. All rights reserved.
 Release 20c 20.1.1 revision(31618)

Connected to an idle instance.

g1n2> startup nomount;
ERR-HY000(13025): Property file does not exist (/home/goldilocks/goldilocks_data/conf/goldilocks.properties.conf)

沒有參數(shù)文件,從其他正好節(jié)點復(fù)制conf 文件夾過來

g1n2> startup nomount;

ERR-HY000(13004): static shared memory segment is corrupted
ERR-HY000(11040): No such object (/home/goldilocks/goldilocks_data/trc/system.trc): stfOpen() returned errno(2)

在$GOLDILOCKS_DATA 路徑下創(chuàng)建 db,wal,trc,backup,archive_log 文件夾
mkdir wal db trc backup archive_log

g1n2> startup nomount;

Startup success

g1n2> alter system mount database;

ERR-42000(14046): file does not exist - '/home/goldilocks/goldilocks_data/wal/control_0.ctl'

g1n2> alter database restore controlfile from '/home/goldilocks/increment_backup/controlD20201019T160304L0S4.inc';

Database altered.

g1n2> alter system mount database;

ERR-42000(14046): file does not exist - '/home/goldilocks/goldilocks_data/wal/commit.log'

commit log作用:MEM_TRANS_TBS 表空間用滿時,將多出來的transaction record 存在commit log中。恢復(fù)過程中不需要考慮這個文件
,制作一個空文件即可
dd if=/dev/zero of=commit.log bs=1048576 count=100

g1n2> alter system mount database;

ERR-HY000(11040): No such object (/home/goldilocks/goldilocks_data/wal/location.ctl) : stfOpen() returned errno(2)
location.ctl 文件不存在,從其他節(jié)點復(fù)制一個

g1n2> alter system mount database;

System altered.

g1n2> alter database restore;

Database altered.

g1n2> alter database recover;

ERR-42000(14046): file does not exist - '/home/goldilocks/goldilocks_data/wal/redo_0_0.log'

查看當(dāng)前日志文件
g1n2> select * from v$logfile;

GROUP_ID FILE_NAME                                         GROUP_STATE FILE_SEQ FILE_SIZE
-------- ------------------------------------------------- ----------- -------- ---------
       0 /home/goldilocks/goldilocks_data/wal/redo_0_0.log CURRENT            4 104857600
       1 /home/goldilocks/goldilocks_data/wal/redo_1_0.log INACTIVE           1 104857600
       2 /home/goldilocks/goldilocks_data/wal/redo_2_0.log INACTIVE           2 104857600
       3 /home/goldilocks/goldilocks_data/wal/redo_3_0.log INACTIVE           3 104857600

4 rows selected.

根據(jù)歸檔日志,恢復(fù)redo日志

g1n2> ! cp /home/goldilocks/goldilocks_data/archive_log/archive_4.log /home/goldilocks/goldilocks_data/wal/redo_0_0.log

g1n2> ! cp /home/goldilocks/goldilocks_data/archive_log/archive_1.log /home/goldilocks/goldilocks_data/wal/redo_1_0.log

g1n2> ! cp /home/goldilocks/goldilocks_data/archive_log/archive_2.log /home/goldilocks/goldilocks_data/wal/redo_2_0.log

g1n2> ! cp /home/goldilocks/goldilocks_data/archive_log/archive_3.log /home/goldilocks/goldilocks_data/wal/redo_3_0.log


g1n2> alter database begin incomplete recovery;

ERR-01000(14104): Warning: suggestion '/home/goldilocks/goldilocks_data/archive_log/archive_4.log'
ERR-01000(14103): Warning: media recovery needs a logfile including log (Lsn 45702)
Database altered.

g1n2> alter database  recover automatically;

ERR-01000(14104): Warning: suggestion '/home/goldilocks/goldilocks_data/archive_log/archive_5.log'
ERR-01000(14103): Warning: media recovery needs a logfile including log (Lsn 46421)
Database altered.

g1n2> alter database  end incomplete recovery;

Database altered.

g1n2> alter system open local database resetlogs;

System altered.

g1n2> alter system join database;

ERR-42000(16405): of the total '11' tables in the database, '10' tables need to be rebalanced
System altered.

g1n2> alter database rebalance;

Database altered.


g1n2> select * from t2 order by 1;

ID NAME                       
-- ---------------------------
 1 after rm data file         
 2 after rm data file         
 3 during g2n2 recovery       
 4 during g2n2 recovery       
 5 after rm all g1n2 data file
 6 after rm all g1n2 data file

6 rows selected.

恢復(fù)完成
g1n2> select a.member_name,a.member_id ,b.LOGICAL_CONNECTION,b.PHYSICAL_CONNECTION ,b.LOCAL_SCN from cluster_member@local a, x$cluster_member@local b where a.MEMBER_ID=b.MEMBER_ID order by a.GROUP_ID, 1;

MEMBER_NAME MEMBER_ID LOGICAL_CONNECTION PHYSICAL_CONNECTION LOCAL_SCN  
----------- --------- ------------------ ------------------- -----------
G1N1                1 ACTIVE             ACTIVE              1138.0.0   
G1N2                6 ACTIVE             ACTIVE              1138.0.1264
G2N1                5 ACTIVE             ACTIVE              1138.0.0   
G2N2                7 ACTIVE             ACTIVE              1138.0.0   

4 rows selected.

新成員加入集群

適用場景:備份和數(shù)據(jù)一起丟失了,沒有途徑恢復(fù)數(shù)據(jù)了。

g2n2 上刪除數(shù)據(jù)庫文件
rm archive_log/* backup/* db/* wal/*

查看集群狀態(tài)

g1n1>  select a.member_name,a.member_id ,b.LOGICAL_CONNECTION,b.PHYSICAL_CONNECTION ,b.LOCAL_SCN from cluster_member@local a, x$cluster_member@local b where a.MEMBER_ID=b.MEMBER_ID order by a.GROUP_ID, 1;

MEMBER_NAME MEMBER_ID LOGICAL_CONNECTION PHYSICAL_CONNECTION LOCAL_SCN 
----------- --------- ------------------ ------------------- ----------
G1N1                1 ACTIVE             ACTIVE              1140.0.674
G1N2                6 INACTIVE           INACTIVE            -1.-1.-1  
G2N1                5 ACTIVE             ACTIVE              1140.0.0  
G2N2                7 ACTIVE             ACTIVE              1140.0.0 

g2n2 重新創(chuàng)建數(shù)據(jù)庫

[goldilocks@gs06 goldilocks_data]$ gcreatedb --cluster --db_name='glodilocks' --timezone='+08:00' --character_set="UTF8" --char_length_units="OCTETS" --member='g1n2' --host=192.168.149.132 --port=10120

 Copyright ? 2010 SUNJESOFT Inc. All rights reserved.
 Release 20c 20.1.1 revision(31618)

Database created

g1n1> alter database drop inactive cluster  members;

Database altered.

g1n1>  select a.member_name,a.member_id ,b.LOGICAL_CONNECTION,b.PHYSICAL_CONNECTION ,b.LOCAL_SCN from cluster_member@local a, x$cluster_member@local b where a.MEMBER_ID=b.MEMBER_ID order by a.GROUP_ID, 1;

MEMBER_NAME MEMBER_ID LOGICAL_CONNECTION PHYSICAL_CONNECTION LOCAL_SCN 
----------- --------- ------------------ ------------------- ----------
G1N1                1 ACTIVE             ACTIVE              1142.0.674
G2N1                5 ACTIVE             ACTIVE              1142.0.0  
G2N2                7 ACTIVE             ACTIVE              1142.0.0  

3 rows selected.

新節(jié)點
g1n2> startup

Startup success

g1n2> alter system open global database;

System altered.


將節(jié)點加入到集群
g1n1> alter cluster group g1 add cluster member g1n2 host '192.168.149.132' port 10120;

Cluster Group altered.


新節(jié)點需要rebalance 數(shù)據(jù)

g1n2> alter database rebalance;

Database altered.

g1n2> select * from t2@local order by 1;

ID NAME                       
-- ---------------------------
 2 after rm data file         
 4 during g2n2 recovery       
 6 after rm all g1n2 data file

3 rows selected.

g1n1>  select a.member_name,a.member_id ,b.LOGICAL_CONNECTION,b.PHYSICAL_CONNECTION ,b.LOCAL_SCN from cluster_member@local a, x$cluster_member@local b where a.MEMBER_ID=b.MEMBER_ID order by a.GROUP_ID, 1;

MEMBER_NAME MEMBER_ID LOGICAL_CONNECTION PHYSICAL_CONNECTION LOCAL_SCN 
----------- --------- ------------------ ------------------- ----------
G1N1                1 ACTIVE             ACTIVE              1155.0.675
G1N2                8 ACTIVE             ACTIVE              1155.0.0  
G2N1                5 ACTIVE             ACTIVE              1155.0.0  
G2N2                7 ACTIVE             ACTIVE              1155.0.0  

集群恢復(fù)正常。

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。