關(guān)于pg熱備與主備切換網(wǎng)上很多內(nèi)容都有了,本文僅為自己測(cè)試使用,特意記錄過程,或?qū)ζ渌颂峁﹨⒖?。本文作者選擇pg9.6.1版本作為測(cè)試。
一 主備機(jī)器規(guī)劃
主機(jī)名 | IP | 角色 | 端口
:----:|:----:|:----:|:----:|:----:|:----:
master| 192.168.0.108 |Master|5432
slave|192.168.0.109|Slave|5432
前提:分別在兩臺(tái)主機(jī)上安裝好pg數(shù)據(jù)庫,安裝過程參考之前博客說明http://www.lxweimin.com/p/639ebb43bfb4。
二 創(chuàng)建流復(fù)制
2.1 設(shè)置host
master,slave兩節(jié)點(diǎn)都要操作。
[root@bogon ~]# vim /etc/hosts
#編輯內(nèi)容如下:
192.168.43.127 master
192.168.43.243 slave
按esc,wq!保存退出。
2.2 初始化master數(shù)據(jù)庫
以下操作在master下執(zhí)行:
#切換到postgres賬戶
[root@bogon ~]# su - postgres
#初始化data
[postgres@bogon ~]$ initdb -D $PGDATA
# 啟動(dòng)master數(shù)據(jù)庫
[postgres@bogon ~]$ pg_ctl start -D $PGDATA
#創(chuàng)建流復(fù)制用戶
[postgres@bogon ~]$ psql
psql (9.6.1)
Type "help" for help.
postgres=# CREATE USER repuser replication LOGIN CONNECTION LIMIT 3 ENCRYPTED PASSWORD 'repuser';
CREATE ROLE
2.3 配置pg_hba.conf
在master的pg_hba.conf最后一行增加如下:
host all all 0.0.0.0/0 md5
host replication repuser slave md5
2.4 配置postgresql.conf
在master端配置如下:
listen_addresses = '*'
port = 5432
max_wal_senders = 1
wal_level = replica
archive_mode = on
archive_command = 'cd ./'
hot_standby = on
wal_keep_segments = 64
full_page_writes = on
wal_log_hints = on
配置完成后,重啟master數(shù)據(jù)庫
[postgres@bogon ~]$ pg_ctl restart -D $PGDATA
2.5 pg_basebackup 創(chuàng)建備庫
在slave端的postgres賬戶下執(zhí)行:
#切換到postgres賬戶
[root@bogon ~]# su - postgres
#從主庫備份創(chuàng)建備庫
[postgres@bogon ~]$ pg_basebackup -D $PGDATA -Fp -Xs -v -P -h master -p 5432 -U repuser
transaction log start point: 0/2000060 on timeline 1
pg_basebackup: starting background WAL receiver
22806/22806 kB (100%), 1/1 tablespace
transaction log end point: 0/2000130
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: base backup completed
修改slave中data目錄下的pg_hba.conf最后一行修改如下:
host all all 0.0.0.0/0 md5
host replication repuser master md5
2.6 配置recovery.conf
Master端配置如下:
[postgres@bogon ~]$ ls
bin data gdal geos include lib proj4 share
[postgres@bogon ~]$ cp share/recovery.conf.sample data/recovery.done
[postgres@bogon ~]$ vim data/recovery.done
#編輯內(nèi)容如下
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=slave port=5432 user=repuser password=repuser'
trigger_file = '/home/postgres/data/trigger_file'
Salve端配置如下:
[postgres@bogon ~]$ ls
bin data gdal geos include lib proj4 share
[postgres@bogon ~]$ cp share/recovery.conf.sample data/recovery.conf
[postgres@bogon ~]$ vim data/recovery.conf
#編輯內(nèi)容如下
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=master port=5432 user=repuser password=repuser'
trigger_file = '/home/postgres/data/trigger_file'
2.7 配置.pgpass
master上配置訪問slave參數(shù)
[postgres@bogon ~]$ vim .pgpass
slave:5432:postgres:repuser:repuser
slave上配置訪問master參數(shù)
[postgres@bogon ~]$ vim .pgpass
master:5432:postgres:repuser:repuser
2.8 流復(fù)制數(shù)據(jù)同步測(cè)試
分別啟動(dòng)master,slave數(shù)據(jù)庫
在master上創(chuàng)建一個(gè)數(shù)據(jù)庫和臨時(shí)表
[postgres@bogon data]$ psql
psql (9.6.1)
Type "help" for help.
postgres=# \password #創(chuàng)建數(shù)據(jù)庫密碼
#創(chuàng)建測(cè)試數(shù)據(jù)庫
postgres=# create database test;
CREATE DATABASE
postgres=# \c test
You are now connected to database "test" as user "postgres".
test=# create table tt(id serial not null,name text);
CREATE TABLE
test=# insert into tt(name) values ('china');
INSERT 0 1
在slave上查詢剛才創(chuàng)建的表和數(shù)據(jù),判定是否有數(shù)據(jù)同步
[postgres@bogon data]$ psql
psql (9.6.1)
Type "help" for help.
postgres=# \c test
You are now connected to database "test" as user "postgres".
test=# select * from tt;
id | name
----+-------
1 | china
(1 row)
很明顯,從庫已經(jīng)同步了主庫的數(shù)據(jù),到此可以說PG流復(fù)制熱備已經(jīng)創(chuàng)建結(jié)束了。以下對(duì)流復(fù)制做一些簡單的應(yīng)用。
三 主備切換
一般可以通過若干命令查詢數(shù)據(jù)庫的主備屬性,主數(shù)據(jù)庫是讀寫的,備數(shù)據(jù)庫是只讀的。當(dāng)主數(shù)據(jù)庫宕機(jī)了,可以通過建立觸發(fā)文件,備數(shù)據(jù)庫將被提升為主數(shù)據(jù)庫,實(shí)現(xiàn)一些基本的HA應(yīng)用。
3.1 查詢主備
3.1.1 pg_controldata
主機(jī)
[postgres@localhost ~]$ pg_controldata
pg_control version number: 960
Catalog version number: 201608131
Database system identifier: 6362107256088627972
Database cluster state: in production
備機(jī)
pg_control version number: 960
Catalog version number: 201608131
Database system identifier: 6362107256088627972
Database cluster state: in archive recovery
主機(jī)的cluster state是in production,備機(jī)的cluster state是in archive recovery。
3.1.2 字典表pg_stat_replication
在主機(jī)字典表中是能查到記錄,備機(jī)中是查詢不到的。
postgres=# select pid,application_name,client_addr,client_port,state,sync_state from pg_stat_replication;
pid | application_name | client_addr | client_port | state | sync_state
-------+------------------+---------------+-------------+-----------+------------
17131 | walreceiver | 192.168.0.105 | 55734 | streaming | async
(1 row)
3.1.3 進(jìn)程信息識(shí)別
進(jìn)程中顯示wal sender的是主機(jī),顯示wal receiver的是備機(jī)
master:
slave:
3.1.4 通過pg函數(shù)
備機(jī)是t,主機(jī)是f。
主機(jī)
postgres=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
f
(1 row)
備機(jī)
postgres=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
t
(1 row)
3.2 備機(jī)切換成主機(jī)
主機(jī)宕機(jī)前:
主機(jī)執(zhí)行pg_ctl stop 命令。
宕機(jī)后:
備機(jī)報(bào)錯(cuò),說不能連接主機(jī)了。
之前曾配置過:
trigger_file = '/home/postgres/datatrigger_file'
備機(jī)切換到主機(jī),在備機(jī)上執(zhí)行
[postgres@localhost ]$ touch /home/postgres/data/trigger_file;
再去備機(jī)上查看進(jìn)程:
備機(jī)已經(jīng)切換到主機(jī)了。
3.3 原來主機(jī)切換成備機(jī)
在當(dāng)前主機(jī)(現(xiàn)在是slave了,主機(jī)切換了)執(zhí)行插入語句
postgres=# insert into tt(name) values('sdf');
如果master上data目錄中的recovery.done沒有變成recovery.conf,可以手動(dòng)強(qiáng)制更改。
[postgres@data]$ mv recovery.done recovery.conf
如果已經(jīng)是recovery.conf就直接執(zhí)行下面的。
#啟動(dòng)數(shù)據(jù)庫
[postgres@data]$ pg_ctl start
[postgres@bogon data]$ ERROR: requested starting point 0/6000000 on timeline 1 is not in this server's history
DETAIL: This server's history forked from timeline 1 at 0/4000098.
ERROR: requested starting point 0/6000000 on timeline 1 is not in this server's history
DETAIL: This server's history forked from timeline 1 at 0/4000098.
ERROR: requested starting point 0/6000000 on timeline 1 is not in this server's history
#其實(shí)還有很多時(shí)間線不一致等
原因是當(dāng)前主機(jī)slave數(shù)據(jù)發(fā)生變化了,原來的master數(shù)據(jù)和當(dāng)前數(shù)據(jù)不一致了,要確保數(shù)據(jù)時(shí)間線一致。我們使用pg_rewind來同步時(shí)間線。
在master(從機(jī)上操作)
#從slave上拉取最新時(shí)間線和數(shù)據(jù)給當(dāng)前的master
[postgres@bogon ~]$ pg_rewind --target-pgdata=/home/postgres/data --source-server='host=slave port=5432 user=postgres dbname=postgres'
target server must be shut down cleanly
Failure, exiting
發(fā)現(xiàn)報(bào)錯(cuò)了,說target server 必須關(guān)閉,source-server是slave,那么target server就是master,那么停掉master上pg服務(wù)。
#先停止服務(wù)
[postgres@bogon ~]$ pg_ctl stop
#再拉取數(shù)據(jù)
[postgres@bogon ~]$ pg_rewind --target-pgdata=/home/postgres/data --sourceserver='host=slave port=5432 user=postgres dbname=postgres'
servers diverged at WAL position 0/4000098 on timeline 1
rewinding from last common checkpoint at 0/4000028 on timeline 1
Done!
#重啟服務(wù)
[postgres@bogon ~]$ pg_ctl start
可以檢查下,現(xiàn)在的master上數(shù)據(jù)和slave是一致的了。