Greenplum 是一款基于postsqlgre實(shí)現(xiàn)的分布式關(guān)系型數(shù)據(jù)庫(kù),我們今天嘗試下如何把它部署到kubernetes上。
docker-compose部署方式
在部署到kkubernetes之前,我們先來(lái)演示一個(gè)使用docker部署的例子,本例是基于這篇文章,感謝提供的基礎(chǔ)鏡像。
1. 下載greenplum
https://github.com/greenplum-db/gpdb/releases
2. 編譯greenplum鏡像
定義Dockerfile文件
# lyasper/gphost 基礎(chǔ)鏡像提供了sshd服務(wù),及一些初始化腳本
FROM lyasper/gphost
# open-source-greenplum-xx...rpm為上面下載的greenplum安裝包
COPY open-source-greenplum-db-6.14.1-rhel7-x86_64.rpm /home/gpadmin/greenplum-db.rpm
RUN yum install -y /home/gpadmin/greenplum-db.rpm
RUN chown -R gpadmin /usr/local/greenplum-db*
RUN rm -f /home/gpadmin/greenplum-db.rpm
docker build -t greenplum:6 .
3. 編寫(xiě)一個(gè)docker-compose.yaml文件
version: '3'
services:
mdw:
hostname: mdw
image: "greenplum:6"
ports:
- "6222:22"
- "6432:5432"
sdw1:
hostname: sdw1
image: "greenplum:6"
sdw2:
hostname: sdw2
image: "greenplum:6"
etl:
hostname: etl
image: "greenplum:6"
docker-compose up
4. 初始化greenplum
# 登陸master節(jié)點(diǎn)
ssh -p 6222 gpadmin@127.0.0.1
# 或者 ssh -p 6222 gpadmin@0.0.0.0
# 密碼: changeme
# 初始化配置文件
source /usr/local/greenplum-db/greenplum_path.sh
# 配置greenplum文件
artifact/prepare.sh -s 2 -n 2
# -s: segment(容器)的個(gè)數(shù)
# -n: 每個(gè)segment(容器)上primary的個(gè)數(shù)
# 初始化集群,會(huì)生成env.sh 文件(greenplum所需的環(huán)境變量)
gpinitsystem -a -c gpinitsystem_config
source env.sh
# 開(kāi)啟遠(yuǎn)程無(wú)密碼訪問(wèn)
artifact/postinstall.sh
# 查看安裝結(jié)果
ps -ef | grep postgres
# 查看集群狀態(tài)
gpstate -s
k8s 部署方式
制作k8s專用鏡像(可選)
在部署到k8s之前,我們需要對(duì)現(xiàn)有的鏡像做一些處理,生成一個(gè)k8s專用的greenplum鏡像。稍后我會(huì)把制作好的鏡像放到阿里云上,方便大家拉取。
- 先啟動(dòng)一個(gè)容器并進(jìn)入
注意:docker run 后面一定不要加/bin/bash等參數(shù),否則后面commit容器時(shí)會(huì)覆蓋以前鏡像的CMD參數(shù),所以這里采用先啟用再進(jìn)入到容器的方式
$ docker run -it --rm greenplum:6
$ docker exec -it a2b9a823d845 /bin/bash
- 修改 artifact/prepare.sh 腳本,需修改3個(gè)地方
- 修改第7行master地址末尾加.gp:MASTERHOST=
hostname
.gp - 修改第8行數(shù)據(jù)節(jié)點(diǎn)的前綴:SEG_PREFIX=gp-
- 修改87、90行,分別在地址后面加個(gè) .gp
#!/bin/bash
set -x
if [ -z ${GPHOME+x} ]; then echo "GPHOME is unset";exit 1 ; fi
MASTERHOST=`hostname`.gp
SEG_PREFIX=gp-
SEG_HOSTNUM=0 # 0 means muster only
SEG_NUMPERHOST=1
VERBOSE=0
.....
if [ $SEG_HOSTNUM -eq 0 ];then
echo $MASTERHOST.gp > $HOSTFILE
else
for i in $(seq 1 $SEG_HOSTNUM); do
echo $SEG_PREFIX$i.gp >> $HOSTFILE
done
fi
- 修改/usr/local/greenplum-db-6.14.1/bin/gpinitsystem 腳本第1320行,在解析主機(jī)地址時(shí),末尾添加.gp,這樣就可以用gp-1.gp去連接segment服務(wù)器,而不是gp-1。
1314 HOST_LOOKUP() {
1315 res=`echo $1 | $GPHOSTCACHELOOKUP`
1316 err_index=`echo $res | awk '{print index($0,"__lookup_of_hostname_failed__")}'`
1317 if [ $err_index -ne 0 ]; then
1318 echo "__lookup_of_hostname_failed__"
1319 else
1320 echo $res.gp # 在主機(jī)
1321 fi
1322 }
- 創(chuàng)建一個(gè)初始化腳本
新建一個(gè)腳本 /home/gpadmin/init.sh,這個(gè)腳本僅在master上運(yùn)行。
#!/bin/bash
# whoami: gpadmin
# only run on master
host=`hostname`
if [ $host = "gp-0" ];then
source /usr/local/greenplum-db/greenplum_path.sh
artifact/prepare.sh -s 2 -n 2
gpinitsystem -a -c gpinitsystem_config
source env.sh
artifact/postinstall.sh
gpstate -s
fi
- 生成一個(gè)新的鏡像
我已經(jīng)把鏡像放到了阿里云上,所以后面直接拉取即可
$ docker commit -a "wangjc" -m "Greenplum on Kubernetes" 60875d7be058 greenplum-k8s:6.14.18
$ docker tag greenplum-k8s:6.14.18 registry.cn-hangzhou.aliyuncs.com/9c/greenplum-k8s:6.14.18
$ docker push registry.cn-hangzhou.aliyuncs.com/9c/greenplum-k8s:6.14.18
部署到k8s
我的 k8s版本為: 1.18.0,低版本是不支持PVC的,注意下
- 創(chuàng)建Local Volume
這里我們使用本地的存儲(chǔ)卷來(lái)保存greenplum的數(shù)據(jù),在部署之前,先保證 /var/greenplum目錄存在且有足夠的權(quán)限。
# gp-lv.yaml
### greenplum storage ###
apiVersion: v1
kind: PersistentVolume
metadata:
name: greenplum-pv
spec:
capacity:
storage: 2Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Delete
storageClassName: greenplum-storage
local:
path: /var/greenplum
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- master
- node1
- node2
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: greenplum-claim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
storageClassName: greenplum-storage
$ kubectl create -f gp-lv.yaml
$ kubectl get pv
$ kubectl get pvc
- 部署greenplum文件
- 這里采用StatefulSet方式部署,部署三個(gè)服務(wù): gp-0作為master,gp-1、gp-2作為segment
- master服務(wù)開(kāi)放兩個(gè)端口:8432為gp數(shù)據(jù)庫(kù)的端口、8222為ssh服務(wù)端口
# gp.yaml
### greenplum db ###
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: gp
spec:
selector:
matchLabels:
app: gp
serviceName: gp
replicas: 3
template:
metadata:
labels:
app: gp
spec:
securityContext:
# 允許對(duì)非root用戶訪問(wèn)PersistentVolumeClaim,否則無(wú)法寫(xiě)Local PVC
#runAsUser: 1000
fsGroup: 1000
volumes:
- name: gp-pv-storage
persistentVolumeClaim:
claimName: greenplum-claim
containers:
- name: gp-container
image: registry.cn-hangzhou.aliyuncs.com/9c/greenplum-k8s:6.14.18
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: "/home/gpadmin/master"
name: gp-pv-storage
- mountPath: "/home/gpadmin/data"
name: gp-pv-storage
ports:
- name: gp-port
containerPort: 5432
- name: ssh-port
containerPort: 22
---
#service
apiVersion: v1
kind: Service
metadata:
name: gp
spec:
selector:
app: gp
type: ClusterIP
clusterIP: None
---
#service
apiVersion: v1
kind: Service
metadata:
name: gp-out
spec:
ports:
- name: gp-port
port: 8432
nodePort: 8432
protocol: TCP
targetPort: gp-port
- name: ssh-port
port: 8222
nodePort: 8222
protocol: TCP
targetPort: ssh-port
selector:
app: gp
statefulset.kubernetes.io/pod-name: gp-0
type: NodePort
$ kubectl create -f gp.yaml
$ kubectl get pods
- 初始化gp數(shù)據(jù)庫(kù)
通過(guò)ssh連接上master服務(wù)器(密碼是:changeme),進(jìn)入到服務(wù)器內(nèi)部進(jìn)行初始化,192.168.x.x為k8s任意一個(gè)節(jié)點(diǎn)。初始化成功后,會(huì)在每個(gè)segment節(jié)點(diǎn)上生成兩個(gè)段。
$ ssh -p 8222 gpadmin@192.168.x.x
$ ./init.sh
20210310:02:50:04:002506 gpstate:gp-0:gpadmin-[INFO]:-Obtaining Segment details from master...
20210310:02:50:04:002506 gpstate:gp-0:gpadmin-[INFO]:-Gathering data from segments...
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-----------------------------------------------------
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:--Master Configuration & Status
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-----------------------------------------------------
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Master host = gp-0
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Master postgres process ID = 2345
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Master data directory = /home/gpadmin/master/gpseg-1
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Master port = 5432
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Master current role = dispatch
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Greenplum initsystem version = 6.14.1 build commit:5ef30dd4c9878abadc0124e0761e4b988455a4bd Open Source
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Greenplum current version = PostgreSQL 9.4.24 (Greenplum Database 6.14.1 build commit:5ef30dd4c9878abadc0124e0761e4b988455a4bd Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Feb 22 2021 22:11:57
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Postgres version = 9.4.24
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Master standby = No master standby configured
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-----------------------------------------------------
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-Segment Instance Status Report
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-----------------------------------------------------
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Segment Info
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Hostname = gp-1.gp
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Address = gp-1.gp
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Datadir = /home/gpadmin/data/gpseg0
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Port = 10000
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Status
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- PID = 842
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Configuration reports status as = Up
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Database status = Up
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-----------------------------------------------------
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Segment Info
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Hostname = gp-1.gp
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Address = gp-1.gp
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Datadir = /home/gpadmin/data/gpseg1
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Port = 10001
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Status
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- PID = 843
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Configuration reports status as = Up
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Database status = Up
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-----------------------------------------------------
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Segment Info
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Hostname = gp-2.gp
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Address = gp-2.gp
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Datadir = /home/gpadmin/data/gpseg2
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Port = 10000
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Status
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- PID = 842
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Configuration reports status as = Up
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Database status = Up
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-----------------------------------------------------
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Segment Info
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Hostname = gp-2.gp
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Address = gp-2.gp
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Datadir = /home/gpadmin/data/gpseg3
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Port = 10001
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Status
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- PID = 843
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Configuration reports status as = Up
20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:- Database status = Up
恢復(fù)數(shù)據(jù)
部署到k8s中最大的問(wèn)題就是當(dāng)pod重建時(shí),容器被“打回原形”,但好在我們將gp的數(shù)據(jù)保存在宿主機(jī)上,但這時(shí)不能用上面 init.sh 腳本初始化了,這樣會(huì)把之前的數(shù)據(jù)覆蓋掉。因此需要對(duì)鏡像再做修改。
按照前面的方法,啟動(dòng)一個(gè)容器,進(jìn)去添加一個(gè)新的腳本 repair.sh ,專門(mén)用來(lái)修復(fù)數(shù)據(jù)的。
# 用上一個(gè)版本的鏡像啟動(dòng)
$ docker run -it registry.cn-hangzhou.aliyuncs.com/9c/greenplum-k8s:6.14.18
在鏡像中添加:repair.sh
#!/bin/bash
# whoami: gpadmin
# only run on master
host=`hostname`
if [ $host = "gp-0" ];then
source /usr/local/greenplum-db/greenplum_path.sh
# 注意下prepare.sh腳本,這里面有個(gè)刪除master、data等目錄的操作,需要注釋掉
artifact/prepare.sh -s 2 -n 2
#gpinitsystem -a -c gpinitsystem_config
source env.sh
#artifact/postinstall.sh
# 解決pod重建后,掛載目錄權(quán)限問(wèn)題導(dǎo)致啟動(dòng)失敗
sudo chmod 0700 -R $HOME/master/*
for HOST in `cat $HOME/hostfile`; do
ssh gpadmin@${HOST} "sudo chmod 0700 -R $HOME/data/*"
ssh gpadmin@${HOST} "sudo chmod 0700 -R $HOME/master/*"
ssh gpadmin@${HOST} "sudo chmod 0700 -R $HOME/mirror/*"
done
gpstart
fi
提交容器生成一個(gè)新的鏡像
$ docker commit -a "wangjc" -m "Greenplum on Kubernetes" 135b86314ce1 registry.cn-hangzhou.aliyuncs.com/9c/greenplum-k8s:6.14.18
$ docker push registry.cn-hangzhou.aliyuncs.com/9c/greenplum-k8s:6.14.18
使用新鏡像部署gp,并新建數(shù)據(jù)庫(kù)添加數(shù)據(jù),這時(shí)將gp-0刪除,這里要注意的是:刪除pod會(huì)導(dǎo)致svc失效,需要重新創(chuàng)建下。
$ kubectl delete pod gp-0
$ kubectl delete -f svc.yaml && kubectl create -f svc.yaml
#連接到gp庫(kù)的master,進(jìn)去執(zhí)行修復(fù)操作
$ ssh -p 8222 gpadmin@192.168.x.x
#執(zhí)行恢復(fù)腳本, 執(zhí)行到最后需要手工輸入一個(gè):y
$ ./repair.sh
支持鏡像節(jié)點(diǎn)
默認(rèn)鏡像節(jié)點(diǎn)功能是關(guān)閉的,還是按之前的方式進(jìn)入容器中修改后重新提交一個(gè)新的容器。
- 修改 vi artifact/prepare.sh 文件添加如下代碼
MIRDATASTR=""
for i in $(seq 1 $SEG_NUMPERHOST); do
MIRDATASTR="$MIRDATASTR $PREFIX/mirror"
done
# 增加一個(gè)MIRDATASTR替換
sed "s/%%PORT_BASE%%/$PORT_BASE/g; s|%%MIRDATASTR%%|$MIRDATASTR|g; s|%%PREFIX%%|$PREFIX|g;
- 修改 artifact/gpinitsystem_config_template 文件將所有以MIRROR開(kāi)頭的屬性注釋打開(kāi),并修改鏡像目錄如下
declare -a MIRROR_DATA_DIRECTORY=(%%MIRDATASTR%%)
- 修改init.sh 文件,修改3個(gè)段服務(wù)器,每個(gè)服務(wù)器分4個(gè)段
artifact/prepare.sh -s 3 -n 4
- 提交容器生成一個(gè)新的鏡像
$ docker commit -a "wangjc" -m "Greenplum on Kubernetes" 135b86314ce1 registry.cn-hangzhou.aliyuncs.com/9c/greenplum-k8s:6.14.18
$ docker push registry.cn-hangzhou.aliyuncs.com/9c/greenplum-k8s:6.14.18