PGPool-II+PG流復制實現HA主備切換

基于PG的流復制能實現熱備切換,但是是要手動建立觸發文件實現,對于一些HA場景來說,需要當主機down了后,備機自動切換,經查詢資料知道pgpool-II可以實現這種功能。本文基于PG流復制基礎上 ,以pgpool-II實現主備切換。在配置pgpool之前需分別在兩臺規劃機上安裝好pg數據庫,且配置好了流復制環境,關于流復制配置參考前文:http://www.lxweimin.com/p/12bc931ebba3

pgpool雙機集群架構圖.png

  基于PGPool的雙機集群如上圖所示:pg主節點和備節點實現流復制熱備,pgpool1,pgpool2作為中間件,將主備pg節點加入集群,實現讀寫分離,負載均衡和HA故障自動切換。兩pgpool節點可以委托一個虛擬ip節點作為應用程序訪問的地址,兩節點之間通過watchdog進行監控,當pgpool1宕機時,pgpool2會自動接管虛擬ip繼續對外提供不間斷服務。

一 主機規劃

主機名 | IP | 角色 | 端口
:----:|:----:|:----:|:----:|:----:|:----:
master| 192.168.0.108 |PGMaster|5432
| 192.168.0.108|pgpool1|9999
slave|192.168.0.109|PGSlave|5432
| 192.168.0.109|pgpool2|9999
vip|192.168.0.150|虛擬ip|9999
建立好主機規劃之后,在master,slave上兩臺機器設置下host

[root@localhost ~]# vi .bashrc
#編輯內容如下:
192.168.0.108 master
192.168.0.109 slave
192.168.0.150 vip

二 配置ssh秘鑰

在master,slave機器上都生成ssh如下:

[root@localhost ~]# su - postgres
[postgres@localhost ~]$ ssh-keygen -t rsa
[postgres@localhost ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[postgres@localhost ~]$ chmod 600 ~/.ssh/authorized_keys

分別將master的公鑰復制到slave,slave的公鑰復制到master。

#master端
[postgres@localhost ~]$ scp ~/.ssh/authorized_keys postgres@slave:~/.ssh/
#slave端
[postgres@localhost ~]$ scp ~/.ssh/authorized_keys postgres@master:~/.ssh/

驗證下ssh配置是否成功

#master端
[postgres@slave ~]$ ssh postgres@slave
Last login: Tue Dec 20 21:22:50 2016 from master
#slave端
[postgres@slave ~]$ ssh postgres@master
Last login: Tue Dec 20 21:22:50 2016 from slave

證明ssh信任關系配置成功。

三 安裝pgpool

中文配置地址可參考http://pgpool.projects.pgfoundry.org/pgpool-II/doc/pgpool-zh_cn.html

# 下載pgpool
[root@master opt]#   wget http://www.pgpool.net/mediawiki/images/pgpool-II-3.6.0.tar.gz
# 解壓
[root@master opt]#   tar -zxvf pgpool-II-3.6.0.tar.gz
# 文件權限設置為postgres(其實并非一定裝在postgres賬戶,只不過之前ssh設置都在postgres下,為了方便)
[root@master opt]#   chown -R postgres.postgres /opt/pgpool-II-3.6.0
[root@master ~]# su - postgres
[postgres@master opt]$  cd pgpool-II-3.6.0
[postgres@master pgpool-II-3.6.0]$  ./configure –prefix=/opt/pgpool -with-pgsql=path -with-pgsql=/home/postgres
[postgres@master pgpool-II-3.6.0]$  make
[postgres@master pgpool-II-3.6.0]$  make install

安裝pgpool相關函數,并非強制,可選安裝,為了系統穩定,建議安裝
安裝pg_reclass,pg_recovery

[postgres@master pgpool-II-3.6.0]$  cd src/sql
[postgres@master sql]$  make
[postgres@master sql]$  make install
[postgres@master sql]$  psql -f insert_lock.sql

安裝全部結束。

四 配置pgpool

4.1 配置pgpool環境變量

pgpool裝在了postgres賬戶下,在該賬戶中添加環境變量,master,slave節點都執行。

[postgres@master ~]$ cd /home/postgres
[postgres@master ~]$ vim .bashrc
#編輯內容如下
PGPOOLHOME=/opt/pgpool
export PGPOOLHOME
PATH=$PATH:$HOME/.local/bin:$HOME/bin:$PGHOME/bin:$PGPOOLHOME/bin
export PATH

4.2 配置pool_hba.conf

pool_hba.conf是對登錄用戶進行驗證的,要和pg的pg_hba.conf保持一致,要么都是trust,要么都是md5驗證方式,這里采用了md5驗證方式如下設置:

[postgres@master ~]$ cd /opt/pgpool/etc
[postgres@etc~]$ cp pool_hba.conf.sample pool_hba.conf
[postgres@etc~]$ vim pool_hba.conf
#編輯內容如下
# "local" is for Unix domain socket connections only
local   all         all                            md5
# IPv4 local connections:
host    all         all         0.0.0.0/0          md5
host    all         all         0/0                md5

4.3 配置pcp.conf

pcp.conf配置用于pgpool自己登陸管理使用的,一些操作pgpool的工具會要求提供密碼等,配置如下:

[postgres@master ~]$ cd /opt/pgpool/etc
[postgres@etc~]$ cp pcp.conf.sample pcp.conf
# 使用pg_md5生成配置的用戶名密碼
[postgres@etc~]$ pg_md5 nariadmin
6b07583ba8af8e03043a1163147faf6a
#pcp.conf是pgpool管理器自己的用戶名和密碼,用于管理集群。
[postgres@etc~]$ vim pcp.conf
#編輯內容如下
postgres:6b07583ba8af8e03043a1163147faf6a
#保存退出!
#在pgpool中添加pg數據庫的用戶名和密碼
[postgres@etc~]$ pg_md5 -p -m -u postgres pool_passwd
#數據庫登錄用戶是postgres,這里輸入登錄密碼,不能出錯
#輸入密碼后,在pgpool/etc目錄下會生成一個pool_passwd文件

4.4 配置系統命令權限

配置 ifconfig, arping 執行權限 ,執行failover_stream.sh需要用到,可以讓其他普通用戶執行。

[root@master ~]# chmod u+s /sbin/ifconfig 
[root@master ~]# chmod u+s /usr/sbin 

4.5 配置pgpool.conf

查看本機網卡,配置后面的delegate_IP需要

[postgres@etc~]$ ifconfig
網卡名稱.png

配置master上的pgpool.conf:

[postgres@master ~]$ cd /opt/pgpool/etc
[postgres@etc~]$ cp pgpool.conf.sample pgpool.conf
[postgres@etc~]$ vim pgpool.conf

編輯內容如下:

# CONNECTIONS
listen_addresses = '*'
port = 9999
pcp_listen_addresses = '*'
pcp_port = 9898

# - Backend Connection Settings -

backend_hostname0 = 'master'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/home/postgres/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'

backend_hostname1 = 'slave'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/home/postgres/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'

# - Authentication -
enable_pool_hba = on
pool_passwd = 'pool_passwd'

# FILE LOCATIONS
pid_file_name = '/opt/pgpool/pgpool.pid'

replication_mode = off
load_balance_mode = on
master_slave_mode = on
master_slave_sub_mode = 'stream'

sr_check_period = 5
sr_check_user = 'repuser'
sr_check_password = 'repuser'
sr_check_database = 'postgres'

#------------------------------------------------------------------------------
# HEALTH CHECK 健康檢查
#------------------------------------------------------------------------------

health_check_period = 10 # Health check period
                                   # Disabled (0) by default
health_check_timeout = 20
                                   # Health check timeout
                                   # 0 means no timeout
health_check_user = 'postgres'
                                   # Health check user
health_check_password = 'nariadmin' #數據庫密碼
                                   # Password for health check user
health_check_database = 'postgres'
#必須設置,否則primary數據庫down了,pgpool不知道,不能及時切換。從庫流復制還在連接數據,報連接失敗。
#只有下次使用pgpool登錄時,發現連接不上,然后報錯,這時候,才知道掛了,pgpool進行切換。


#主備切換的命令行配置
#------------------------------------------------------------------------------
# FAILOVER AND FAILBACK
#------------------------------------------------------------------------------

failover_command = '/opt/pgpool/failover_stream.sh %H '

#------------------------------------------------------------------------------
# WATCHDOG
#------------------------------------------------------------------------------

# - Enabling -
use_watchdog = on
# - Watchdog communication Settings -

wd_hostname = 'master'
                                    # Host name or IP address of this watchdog
                                    # (change requires restart)
wd_port = 9000
                                    # port number for watchdog service
                                    # (change requires restart)
# - Virtual IP control Setting -

delegate_IP = 'vip'
                                    # delegate IP address
                                    # If this is empty, virtual IP never bring up.
                                    # (change requires restart)
if_cmd_path = '/sbin'
                                    # path to the directory where if_up/down_cmd exists
                                    # (change requires restart)
if_up_cmd = 'ifconfig eth1:0 inet $_IP_$ netmask 255.255.255.0'
                                    # startup delegate IP command
                                    # (change requires restart)
                                    # eth1根據現場機器改掉
if_down_cmd = 'ifconfig eth1:0 down'
                                    # shutdown delegate IP command
                                    # (change requires restart)
                                    # eth1根據現場機器改掉
# -- heartbeat mode --

wd_heartbeat_port = 9694
                                    # Port number for receiving heartbeat signal
                                    # (change requires restart)
wd_heartbeat_keepalive = 2
                                    # Interval time of sending heartbeat signal (sec)
                                    # (change requires restart)
wd_heartbeat_deadtime = 30
                                    # Deadtime interval for heartbeat signal (sec)
                                    # (change requires restart)
heartbeat_destination0 = 'slave'
                                    # Host name or IP address of destination 0
                                    # for sending heartbeat signal.
                                    # (change requires restart)
heartbeat_destination_port0 = 9694
                                    # Port number of destination 0 for sending
                                    # heartbeat signal. Usually this is the
                                    # same as wd_heartbeat_port.
                                    # (change requires restart)
heartbeat_device0 = 'eth1'
                                    # Name of NIC device (such like 'eth0')
                                    # used for sending/receiving heartbeat
                                    # signal to/from destination 0.
                                    # This works only when this is not empty
                                    # and pgpool has root privilege.
                                    # (change requires restart)
                                    # eth1根據現場機器改掉
# - Other pgpool Connection Settings -

other_pgpool_hostname0 = 'slave' #對端
                                    # Host name or IP address to connect to for other pgpool 0
                                    # (change requires restart)
other_pgpool_port0 = 9999
                                    # Port number for othet pgpool 0
                                    # (change requires restart)
other_wd_port0 = 9000
                                    # Port number for othet watchdog 0
                                    # (change requires restart)

配置slave上的pgpool.conf:

# CONNECTIONS
listen_addresses = '*'
port = 9999
pcp_listen_addresses = '*'
pcp_port = 9898

# - Backend Connection Settings -

backend_hostname0 = 'master'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/home/postgres/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'

backend_hostname1 = 'slave'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/home/postgres/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'

# - Authentication -
enable_pool_hba = on
pool_passwd = 'pool_passwd'

# FILE LOCATIONS
pid_file_name = '/opt/pgpool/pgpool.pid'

replication_mode = off
load_balance_mode = on
master_slave_mode = on
master_slave_sub_mode = 'stream'

sr_check_period = 5
sr_check_user = 'repuser'
sr_check_password = 'repuser'
sr_check_database = 'postgres'

#------------------------------------------------------------------------------
# HEALTH CHECK 健康檢查
#------------------------------------------------------------------------------

health_check_period = 10 # Health check period
                                   # Disabled (0) by default
health_check_timeout = 20
                                   # Health check timeout
                                   # 0 means no timeout
health_check_user = 'postgres'
                                   # Health check user
health_check_password = 'nariadmin' #數據庫密碼
                                   # Password for health check user
health_check_database = 'postgres'
#必須設置,否則primary數據庫down了,pgpool不知道,不能及時切換。從庫流復制還在連接數據,報連接失敗。
#只有下次使用pgpool登錄時,發現連接不上,然后報錯,這時候,才知道掛了,pgpool進行切換。


#主備切換的命令行配置
#------------------------------------------------------------------------------
# FAILOVER AND FAILBACK
#------------------------------------------------------------------------------

failover_command = '/opt/pgpool/failover_stream.sh %H '

#------------------------------------------------------------------------------
# WATCHDOG
#------------------------------------------------------------------------------

# - Enabling -
use_watchdog = on
# - Watchdog communication Settings -

wd_hostname = 'slave'  #本端
                                    # Host name or IP address of this watchdog
                                    # (change requires restart)
wd_port = 9000
                                    # port number for watchdog service
                                    # (change requires restart)
# - Virtual IP control Setting -

delegate_IP = 'vip'
                                    # delegate IP address
                                    # If this is empty, virtual IP never bring up.
                                    # (change requires restart)
if_cmd_path = '/sbin'
                                    # path to the directory where if_up/down_cmd exists
                                    # (change requires restart)
if_up_cmd = 'ifconfig eth1:0 inet $_IP_$ netmask 255.255.255.0'
                                    # startup delegate IP command
                                    # (change requires restart)
                                    # eth1根據現場機器改掉
if_down_cmd = 'ifconfig eth1:0 down'
                                    # shutdown delegate IP command
                                    # (change requires restart)
                                    # eth1根據現場機器改掉
# -- heartbeat mode --

wd_heartbeat_port = 9694
                                    # Port number for receiving heartbeat signal
                                    # (change requires restart)
wd_heartbeat_keepalive = 2
                                    # Interval time of sending heartbeat signal (sec)
                                    # (change requires restart)
wd_heartbeat_deadtime = 30
                                    # Deadtime interval for heartbeat signal (sec)
                                    # (change requires restart)
heartbeat_destination0 = 'master' #對端
                                    # Host name or IP address of destination 0
                                    # for sending heartbeat signal.
                                    # (change requires restart)
heartbeat_destination_port0 = 9694
                                    # Port number of destination 0 for sending
                                    # heartbeat signal. Usually this is the
                                    # same as wd_heartbeat_port.
                                    # (change requires restart)
heartbeat_device0 = 'eth1'
                                    # Name of NIC device (such like 'eth0')
                                    # used for sending/receiving heartbeat
                                    # signal to/from destination 0.
                                    # This works only when this is not empty
                                    # and pgpool has root privilege.
                                    # (change requires restart)
                                    # eth1根據現場機器改掉
# - Other pgpool Connection Settings -

other_pgpool_hostname0 = 'master' #對端
                                    # Host name or IP address to connect to for other pgpool 0
                                    # (change requires restart)
other_pgpool_port0 = 9999
                                    # Port number for othet pgpool 0
                                    # (change requires restart)
other_wd_port0 = 9000
                                    # Port number for othet watchdog 0
                                    # (change requires restart)

配置文件里,故障處理配置的是failover_command = '/opt/pgpool/failover_stream.sh %H ',因此,需要在/opt/pgpool目錄中寫個failover_stream.sh腳本:

[postgres@master ~]$ cd /opt/pgpool
[postgres@pgpool~]$ touch failover_stream.sh
[postgres@pgpool~]$ vim failover_stream.sh

注意這里使用了promote 而不是觸發文件,觸發文件來回切換有問題,編輯內容如下:

#! /bin/sh 
# Failover command for streaming replication. 
# Arguments: $1: new master hostname. 

new_master=$1 
trigger_command="$PGHOME/bin/pg_ctl promote -D $PGDATA" 

# Prompte standby database. 
/usr/bin/ssh -T $new_master $trigger_command 

exit 0; 

如果是其他用戶創建的,需要賦予postgres可執行權限,例如

[root@opt ~]$ chown -R postgres.postgres /opt/pgpool
[root@opt ~]]$ chmod 777  /opt/pgpool/failover_stream.sh

五 PGPool集群管理

啟動之前在master,slave節點創建兩個日志文件:

[root@master ~]# mkdir /var/log/pgpool
[root@master ~]# chown -R postgres.postgres /var/log/pgpool
[root@master ~]# mkdir /var/run/pgpool
[root@master ~]# chown -R postgres.postgres /var/run/pgpool

5.1 啟動集群

分別啟動primary,standby的pg庫

#master上操作
[postgres@master ~]$ pg_ctl start -D $PGDATA
#slave上操作
[postgres@slave ~]$ pg_ctl start -D $PGDATA

分別啟動pgpool命令:

#master上操作
# -D會重新加載pg nodes的狀態如down或up
[postgres@master ~]$ pgpool -n -d -D > /var/log/pgpool/pgpool.log 2>&1 &
[1] 3557

#slave上操作
[postgres@slave ~]$ pgpool -n -d -D > /var/log/pgpool/pgpool.log 2>&1 &
[1] 3557

注意快速終止pgpool命令:

[postgres@ ~]$ pgpool -m fast stop

啟動pgpool后,查看集群節點狀態:

[postgres@master ~]$ psql -h vip -p 9999
psql (9.6.1)
#提示輸入密碼:
Type "help" for help.

postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | master   | 5432 | up     | 0.500000  | primary | 0             | false  | 0
 1       | slave     | 5432 | up     | 0.500000  | standby | 0             |  true  | 0
(2 rows)

#在slave上節點也是psql -h vip -p 9999,雙pgpool使用虛擬ip,做到高可用。

發現當前主備節點都是正常的up狀態。

5.2 Pgpool的HA

5.2.1 模擬master端pgpool宕機

在master節點上停止pgpool服務
[postgres@master ~]$ pgpool -m fast stop
#稍等片刻后,訪問集群
[postgres@master ~]$ psql -h vip -p 9999
psql (9.6.1)
#提示輸入密碼:
Type "help" for help.

postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | master   | 5432 | up     | 0.500000  | primary | 0             | false  | 0
 1       | slave     | 5432 | up     | 0.500000  | standby | 0             |  true  | 0
(2 rows)
#訪問成功,在master節點上的pgpool宕機后,由slave節點的pgpool接管vip和集群服務,并未中斷應用訪問。
#在master上重新啟動pgpool后,定制slave上的pgpool服務,結果一樣。

5.2.2模擬master端pg primary宕機

[postgres@master ~]$ pg_ctl stop
#master端打印
2017-07-24 18:52:37.751 PDT [28154] STATEMENT:  SELECT pg_current_xlog_location()
2017-07-24 18:52:37.760 PDT [2553] LOG:  received fast shutdown request
2017-07-24 18:52:37.760 PDT [2553] LOG:  aborting any active transactions
2017-07-24 18:52:37.762 PDT [28156] FATAL:  canceling authentication due to timeout
2017-07-24 18:52:37.763 PDT [2555] LOG:  shutting down
2017-07-24 18:52:37.768 PDT [28158] FATAL:  the database system is shutting down
2017-07-24 18:52:37.775 PDT [28159] FATAL:  the database system is shutting down
2017-07-24 18:52:39.653 PDT [2553] LOG:  database system is shut down

#slave端打印
2017-07-24 18:52:41.455 PDT [2614] LOG:  invalid record length at 0/2A000098: wanted 24, got 0
2017-07-24 18:52:47.333 PDT [2614] LOG:  received promote request
2017-07-24 18:52:47.333 PDT [2614] LOG:  redo done at 0/2A000028
2017-07-24 18:52:47.333 PDT [2614] LOG:  last completed transaction was at log time 2017-07-24 18:17:00.946759-07
2017-07-24 18:52:47.336 PDT [2614] LOG:  selected new timeline ID: 10
2017-07-24 18:52:47.841 PDT [2614] LOG:  archive recovery complete
2017-07-24 18:52:47.851 PDT [2613] LOG:  database system is ready to accept connections

#日志清楚看到主機down機了,slave切換了。
#稍等片刻后,訪問集群
[postgres@master ~]$ psql -h vip -p 9999
Password: 
psql (10beta1)
Type "help" for help.

postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | master   | 5432 | down   | 0.500000  | standby | 0          | false             | 0
 1       | slave    | 5432 | up     | 0.500000  | primary | 0          | true              | 0
(2 rows)
#slave已經被切換成primary,且master節點狀態是down

5.2.3 修復master節點重新加入集群

master節點down機后,slave節點已經被切換成了primary,修復好master后應重新加入節點,作為primary的standby。
修復master端并啟動操作:

[postgres@master ~]$ cd $PGDATA
[postgres@master data]$ mv recovery.done recovery.conf #一定要把.done改成.conf
[postgres@master data]$ pg_ctl start

在pgpool集群中加入節點狀態:

#注意master的node_id是0,所以-n 0
[postgres@master data]$ pcp_attach_node -d -U postgres -h vip -p 9898 -n 0
#提示輸入密碼,輸入pcp管理密碼。
#查看當前狀態
postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | master   | 5432 | up    | 0.500000  | standby | 0             | false  | 0
 1       | slave     | 5432 | up     | 0.500000  | primary | 0             |  true  | 0
(2 rows)

5.2.4 主機直接down機

當前slave節點是primay,我們直接將slave服務器直接關機后,發現實現了主備切換,slave已經down了,而master已經被切換成了primary:

[postgres@master ~]$ psql -h vip -p 9999
Password: 
psql (10beta1)
Type "help" for help.

postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | master   | 5432 | up     | 0.500000  | primary | 0          | true              | 0
 1       | slave    | 5432 | down   | 0.500000  | standby | 0          | false             | 0
(2 rows)

5.3 數據線同步

在主備切換時,修復節點并重啟后,由于primary數據發生變化,或修復的節點數據發生變化再按照流復制模式加入集群,很可能報時間線不同步錯誤:

#slave機器重啟后,由于master或slave數據不同步產生了
[postgres@slave data]$ mv recovery.done recovery.conf
[postgres@slave data]$ pg_ctl start
waiting for server to start....2017-07-24 19:31:44.563 PDT [2663] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2017-07-24 19:31:44.563 PDT [2663] LOG:  listening on IPv6 address "::", port 5432
2017-07-24 19:31:44.565 PDT [2663] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2017-07-24 19:31:44.584 PDT [2664] LOG:  database system was shut down at 2017-07-24 19:31:30 PDT
2017-07-24 19:31:44.618 PDT [2664] LOG:  entering standby mode
2017-07-24 19:31:44.772 PDT [2664] LOG:  consistent recovery state reached at 0/2D000098
2017-07-24 19:31:44.772 PDT [2663] LOG:  database system is ready to accept read only connections
2017-07-24 19:31:44.772 PDT [2664] LOG:  invalid record length at 0/2D000098: wanted 24, got 0
2017-07-24 19:31:44.798 PDT [2668] LOG:  fetching timeline history file for timeline 11 from primary server
2017-07-24 19:31:44.826 PDT [2668] FATAL:  could not start WAL streaming: ERROR:  requested starting point 0/2D000000 on timeline 10 is not in this server's history
    DETAIL:  This server's history forked from timeline 10 at 0/2B0001B0.
2017-07-24 19:31:44.826 PDT [2664] LOG:  new timeline 11 forked off current database system timeline 10 before current recovery point 0/2D000098
 done

產生這種情況,需要根據pg_rewind工具同步數據時間線,具體分5步走。

5.3.1停掉需要做同步的節點pg服務

[postgres@slave ] pg_ctl stop 

5.3.2 同步master節點上時間線

[postgres@slave data]$ pg_rewind  --target-pgdata=/home/postgres/data --source-server='host=master port=5432 user=postgres dbname=postgres password=nariadmin'
servers diverged at WAL location 0/2B0001B0 on timeline 10
rewinding from last common checkpoint at 0/2B000108 on timeline 10
Done!

5.3.3 修改pg_hba.conf與 recovery.done文件

#pg_hba.conf與 recovery.done都是同步master上來的,要改成slave自己的
[postgres@slave ] cd $PGDATA
[postgres@slave data]$ mv recovery.done recovery.conf
[postgres@slave data]$ vi pg_hba.conf
#slave改成master(相當于slave的流復制對端)
host    replication     repuser         master                   md5
[postgres@slave data]$ vi recovery.conf
#slave改成master(相當于slave的流復制對端)
primary_conninfo = 'host=master port=5432 user=repuser password=repuser'   

5.3.4 重啟pg服務

[postgres@slave data]$ pg_ctl start
waiting for server to start....2017-07-24 19:47:06.821 PDT [2722] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2017-07-24 19:47:06.821 PDT [2722] LOG:  listening on IPv6 address "::", port 5432
2017-07-24 19:47:06.907 PDT [2722] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2017-07-24 19:47:06.930 PDT [2723] LOG:  database system was interrupted while in recovery at log time 2017-07-24 19:25:42 PDT
2017-07-24 19:47:06.930 PDT [2723] HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2017-07-24 19:47:06.961 PDT [2723] LOG:  entering standby mode
2017-07-24 19:47:06.966 PDT [2723] LOG:  redo starts at 0/2B0000D0
2017-07-24 19:47:06.971 PDT [2723] LOG:  consistent recovery state reached at 0/2B01CA30
2017-07-24 19:47:06.972 PDT [2722] LOG:  database system is ready to accept read only connections
2017-07-24 19:47:06.972 PDT [2723] LOG:  invalid record length at 0/2B01CA30: wanted 24, got 0
2017-07-24 19:47:06.982 PDT [2727] LOG:  started streaming WAL from primary at 0/2B000000 on timeline 11
 done
server started

5.3.5 重新加入集群

#注意slave的node_id是1,所以-n 1
[postgres@slave data]$ pcp_attach_node -d -U postgres -h vip -p 9898 -n 1
Password: #提示輸入密碼,輸入pcp管理密碼。
DEBUG: recv: tos="m", len=8
DEBUG: recv: tos="r", len=21
DEBUG: send: tos="C", len=6
DEBUG: recv: tos="c", len=20
pcp_attach_node -- Command Successful
DEBUG: send: tos="X", len=4

5.3.6 查看集群節點狀態

[postgres@slave data]$ psql -h vip -p 9999
Password: 
psql (10beta1)
Type "help" for help.

postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | master   | 5432 | up     | 0.500000  | primary | 0          | true              | 0
 1       | slave    | 5432 | up     | 0.500000  | standby | 0          | false             | 0
(2 rows)

全部恢復工作完成。

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。