OpenShift是RedHat開發的PaaS,使用需要付費訂閱,它的社區版為OKD,兩者安裝方法幾乎一致,只是在操作系統和上層應用軟件上有不同,本文講述OKD的安裝。
集群環境
注意:
集群中的主機是普通的PC,是實實在在的主機,不是虛擬機;
各角色的主機內存不要低于16GB,特別作為“Worker Host”的機器,因為僅部署“openshift-logging”就會消耗不小的內存,所以內存建議越大越好;
如果計劃部署“Storage”(比如Ceph)在“Worker Host”,搭建集群前就給主機安裝所需的非系統硬盤,否則還要關機安裝硬盤;
集群搭建過程需要從Quay.io下載很多的鏡像,如果你的網絡很慢,那安裝時間將會相當長,建議配置一個鏡像站點或者想辦法改善自己的網絡環境。
下面開始演示安裝過程,總共分六個部分:
一,DHCP和DNS
安裝集群前,需要完成DHCP和DNS的配置
1,DHCP
集群主機使用PXE方式安裝操作系統,通過DHCP得到網絡地址信息。配置DHCP主要是下面兩點(筆者使用是Windows Server 2008自帶的DHCP服務)
(1)綁定主機的IP和MAC,方便對DNS進行配置
(2)配置PXE相關
這里使用的bootfile文件為“lpxelinux.0”,原因后面解釋。
2,DNS
本來也準備使用Windows Server 2008自帶的DNS服務,剛好有個CoreDNS容器,就使用了它,配置文件如下,
$ORIGIN okd-infra.wumi.ai. ; designates the start of this zone file in the namespace
$TTL 1h ; default expiration time of all resource records without their own TTL value
okd-infra.wumi.ai. IN SOA ns.okd-infra.wumi.ai. host-1.example.xyz. ( 2007120710 1d 2h 4w 1h )
okd-infra.wumi.ai. IN NS ns ; ns.example.com is a nameserver for example.com
okd-infra.wumi.ai. IN A 10.1.95.9 ; IPv4 address for example.com
ns IN A 10.1.95.9 ; IPv4 address for ns.example.com
bootstrap IN A 10.1.99.7
master-1 IN A 10.1.99.11
master-2 IN A 10.1.99.3
master-3 IN A 10.1.99.8
worker-1 IN A 10.1.99.14
worker-2 IN A 10.1.99.15
worker-3 IN A 10.1.99.16
etcd-0 IN A 10.1.99.11
etcd-1 IN A 10.1.99.3
etcd-2 IN A 10.1.99.8
_etcd-server-ssl._tcp 86400 IN SRV 0 10 2380 etcd-0
_etcd-server-ssl._tcp 86400 IN SRV 0 10 2380 etcd-1
_etcd-server-ssl._tcp 86400 IN SRV 0 10 2380 etcd-2
api IN A 10.1.95.9 ; host-1 haproxy
api-int IN A 10.1.95.9 ; host-1 haproxy
*.apps IN A 10.1.95.9 ; host-1 haproxy
二,HAProxy
HAProxy主要實現對APIServer和Ingress負載均衡訪問,直接上配置文件,
/etc/haproxy/haproxy.cfg
defaults
mode tcp
option dontlognull
timeout connect 10s
timeout client 1m
timeout server 1m
#---------------------------------------------------------------------
frontend openshift-api-server
bind 10.1.95.9:6443
default_backend api-backend
mode tcp
#---------------------------------------------------------------------
backend api-backend
balance source
mode tcp
# server bootstrap 10.1.99.7:6443 check port 6443
server master-1 10.1.99.11:6443 check port 6443
server master-2 10.1.99.3:6443 check port 6443
server master-3 10.1.99.8:6443 check port 6443
#---------------------------------------------------------------------
frontend machine-config-server
bind 10.1.95.9:22623
default_backend machine-config-server
mode tcp
#---------------------------------------------------------------------
backend machine-config-server
balance source
mode tcp
# server bootstrap 10.1.99.7:22623 check port 22623
server master-1 10.1.99.11:22623 check port 22623
server master-2 10.1.99.3:22623 check port 22623
server master-3 10.1.99.8:22623 check port 22623
#---------------------------------------------------------------------
frontend ingress-http
bind 10.1.95.9:80
default_backend ingress-http
mode tcp
#---------------------------------------------------------------------
backend ingress-http
balance source
mode tcp
server worker-1 10.1.99.14:80 check port 80
server worker-2 10.1.99.15:80 check port 80
server worker-3 10.1.99.16:80 check port 80
#---------------------------------------------------------------------
frontend ingress-https
bind 10.1.95.9:443
default_backend ingress-https
mode tcp
#---------------------------------------------------------------------
backend ingress-https
balance source
mode tcp
server worker-1 10.1.99.14:443 check port 443
server worker-2 10.1.99.15:443 check port 443
server worker-3 10.1.99.16:443 check port 443
#---------------------------------------------------------------------
listen admin_stats # 網頁管理頁面
bind 0.0.0.0:8081
mode http
log 127.0.0.1 local0 err
stats refresh 10s
stats uri /haproxy
stats realm welcome login\ Haproxy
stats hide-version
stats admin if TRUE
初始配置“backend machine-config-server”和“backend api-backend”不要注釋bootstrap部分,安裝過程中,如果命令“./openshift-install --dir=<installation_directory> wait-for bootstrap-complete --log-level=info”輸出結果提示移除bootstrap再注釋bootstrap
三,下載需要的軟件并準備安裝配置文件
1,下載需要的軟件
(1) 從“https://github.com/openshift/okd/releases”下載集群安裝工具:openshift-install,該工具協助在公有云和本地基礎設施上部署OpenShift 4集群
(2)安裝集群管理工具:oc,從“ https://mirror.openshift.com/pub/openshift-v4/clients/oc/latest/”下載最新版,oc可以通過命令行的方式連接管理集群
2,定制安裝配置文件
OpenShift 4集群安裝和OpenShift 3完全不同,在安裝前,需要定制安裝配置文件,下面是一個樣例配置文件(文件名必須為install-config.yaml):
apiVersion: v1
baseDomain: wumi.ai
compute:
- hyperthreading: Enabled
name: worker
replicas: 0 //在自維護的物理機上部署集群,需設置為0
controlPlane:
hyperthreading: Enabled
name: master
replicas: 3
metadata:
name: okd-infra
networking:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
networkType: OpenShiftSDN
serviceNetwork:
- 172.30.0.0/16
platform:
none: {}
fips: false
pullSecret: 'pullsecret obtained from redhat'
sshKey: 'sshkey that is created by ssh-keygen command'
搭建的集群里面有三臺作為“master”,主要運行apiserver、etcd cluster等。
“pullSecret”:從紅帽官網獲得,部署集群需要的鏡像存儲在Quay.io,該key主要用來驗證并獲取鏡像。
“sshKey”:是ssh public key,可以通過命令“ssh-keygen”獲得,在配置對應私鑰的主機上,使用ssh命令可以直接登錄集群中的服務器,不用輸入密碼,方便對集群進行調試等。
四,生成k8s manifest和ignition配置文件
1,生成k8s manifest文件
創建目錄“config-install”,將上個步驟中編寫的集群安裝配置文件“install-config.yaml”拷貝到該目錄,然后執行下面命令,
./openshift-install create manifests --dir=config-install
執行后,安裝程序在目錄“config-install”中生成manifests文件(文件install-config.yaml會被消耗掉)
我們不打算運行用戶的pod在master上,修改文件“config-install/manifests/cluster-scheduler-02-config.yml”,將參數“mastersSchedulable”設置為“false”,保存并退出。
2,生成ignition配置文件,該文件完成CoreOS的定制(Openshift 4集群的主機都必須運行CoreOS)
./openshift-install create ignition-configs --dir=config-install
執行后,安裝程序在目錄“config-install”生成ignitions文件(manifests文件會被消耗掉)
五,搭建PXE安裝環境
在得到ignition文件和系統鏡像文件后,配置PXE安裝環境。ignition、kernel、initrd等通過http供集群主機下載,所以首先需要配置http服務器和tftp服務器
1,TFTP服務器
對tftp的配置主要是兩個部分:
PXE的Bootfile要使用“lpxelinux.0”,這樣才可以使用http協議;
pxelinux.cfg配置:
# D-I config version 2.0
# search path for the c32 support libraries (libcom32, libutil etc.)
path debian-installer/amd64/boot-screens/
include debian-installer/amd64/boot-screens/menu.cfg
default debian-installer/amd64/boot-screens/vesamenu.c32
prompt 0
timeout 0
label fedora-coreos-bootstrap
KERNEL http://10.1.95.10:8000/fedora-coreos-32.20200923.3.0-live-kernel-x86_64
APPEND ip=dhcp initrd=http://10.1.95.10:8000/fedora-coreos-32.20200923.3.0-live-initramfs.x86_64.img \
console=tty0 console=ttyS0 coreos.inst.install_dev=/dev/sda \
coreos.inst.ignition_url=http://10.1.95.10:8000/bootstrap.ign \
coreos.inst.install_dev=/dev/sda \
coreos.live.rootfs_url=http://10.1.95.10:8000/fedora-coreos-32.20200923.3.0-live-rootfs.x86_64.img
label fedora-coreos-master
KERNEL http://10.1.95.10:8000/fedora-coreos-32.20200923.3.0-live-kernel-x86_64
APPEND ip=dhcp initrd=http://10.1.95.10:8000/fedora-coreos-32.20200923.3.0-live-initramfs.x86_64.img \
console=tty0 console=ttyS0 coreos.inst.install_dev=/dev/sda \
coreos.inst.ignition_url=http://10.1.95.10:8000/master.ign \
coreos.inst.install_dev=/dev/sda \
coreos.live.rootfs_url=http://10.1.95.10:8000/fedora-coreos-32.20200923.3.0-live-rootfs.x86_64.img
label fedora-coreos-worker
KERNEL http://10.1.95.10:8000/fedora-coreos-32.20200923.3.0-live-kernel-x86_64
APPEND ip=dhcp initrd=http://10.1.95.10:8000/fedora-coreos-32.20200923.3.0-live-initramfs.x86_64.img \
console=tty0 console=ttyS0 coreos.inst.install_dev=/dev/sda \
coreos.inst.ignition_url=http://10.1.95.10:8000/worker.ign \
coreos.inst.install_dev=/dev/sda \
coreos.live.rootfs_url=http://10.1.95.10:8000/fedora-coreos-32.20200923.3.0-live-rootfs.x86_64.img
2,HTTP服務器
筆者使用nginx來配置HTTP服務,nginx的配置文件沒有什么好展示的,將所需的鏡像文件放到“/var/www/html”目錄即可,集群主機在PXE安裝環節會請求這些文件,
aneirin@vm-1:/var/www/html$ ls -lh
total 732M
-rwxrwxrwx 1 root root 297K Oct 16 15:32 bootstrap.ign
-rwxrwxrwx 1 root root 70M Oct 15 10:44 fedora-coreos-32.20200923.3.0-live-initramfs.x86_64.img
-rwxrwxrwx 1 root root 12M Oct 15 10:44 fedora-coreos-32.20200923.3.0-live-kernel-x86_64
-rwxrwxrwx 1 root root 651M Oct 15 10:45 fedora-coreos-32.20200923.3.0-live-rootfs.x86_64.img
-rwxrwxrwx 1 root root 11K Sep 5 2019 index.html //nginx自帶的文件
-rwxrwxrwx 1 root root 612 Apr 22 11:50 index.nginx-debian.html //nginx自帶的文件
-rwxrwxrwx 1 root root 1.9K Oct 16 15:32 master.ign
-rwxrwxrwx 1 root root 1.9K Oct 16 15:32 worker.ign
六,集群安裝
PXE環境配置好之后,就可以為集群的主機安裝操作系統,七臺主機逐次安裝即可(bootstrap->master->worker),不用刻意等待一臺安裝好,再安裝另一臺,直接同時裝也是沒問題的。
使用命令“./openshift-install --dir=<installation_directory> wait-for bootstrap-complete --log-level=info”查看bootstrap的過程,當提示remove bootstrap時,從haproxy的配置文件中移除bootstrap相關配置即可,bootstrap主機的使命就完成了。
1,配置登錄憑據
export KUBECONFIG=<installation_directory>/auth/kubeconfig
可以直接寫在“~/.bashrc”中,下次登錄Shell,KUBECONFIG環境變量是一直存在的
2,連接集群Approving CSR
上步配置好后,就能以用戶“system:admin”連接集群,該用戶對集群有超級管理的權限(集群安裝完成后建議禁用該賬戶,它是一個安全隱患),我們需要對一些對象生成的CSR做Approve操作,這樣組件安裝才能繼續進行,
oc get csr //查看需要Approve的CSR
oc adm certificate approve <csr_name> //Approve指定的CSR
操作完成后,輸出如下,
3,等待clusteroperators安裝完成
OKD集群基礎設施組件嚴重依賴各類“clusteroperators”,需要等待“AVAILABLE”列全部變為“True”
aneirin@host-1:~$ oc get clusteroperators
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
authentication 4.5.0-0.okd-2020-10-03-012432 True False False 3h22m
cloud-credential 4.5.0-0.okd-2020-10-03-012432 True False False 3h55m
cluster-autoscaler 4.5.0-0.okd-2020-10-03-012432 True False False 3h29m
config-operator 4.5.0-0.okd-2020-10-03-012432 True False False 3h29m
console 4.5.0-0.okd-2020-10-03-012432 True False False 3h24m
csi-snapshot-controller 4.5.0-0.okd-2020-10-03-012432 True False False 3h33m
dns 4.5.0-0.okd-2020-10-03-012432 True False False 3h41m
etcd 4.5.0-0.okd-2020-10-03-012432 True False False 3h42m
image-registry 4.5.0-0.okd-2020-10-03-012432 True False False 3h36m
ingress 4.5.0-0.okd-2020-10-03-012432 True False False 3h27m
insights 4.5.0-0.okd-2020-10-03-012432 True False False 3h36m
kube-apiserver 4.5.0-0.okd-2020-10-03-012432 True False False 3h42m
kube-controller-manager 4.5.0-0.okd-2020-10-03-012432 True False False 3h42m
kube-scheduler 4.5.0-0.okd-2020-10-03-012432 True False False 3h40m
kube-storage-version-migrator 4.5.0-0.okd-2020-10-03-012432 True False False 3h27m
machine-api 4.5.0-0.okd-2020-10-03-012432 True False False 3h35m
machine-approver 4.5.0-0.okd-2020-10-03-012432 True False False 3h40m
machine-config 4.5.0-0.okd-2020-10-03-012432 True False False 3h28m
marketplace 4.5.0-0.okd-2020-10-03-012432 True False False 3h35m
monitoring 4.5.0-0.okd-2020-10-03-012432 True False False 3h25m
network 4.5.0-0.okd-2020-10-03-012432 True False False 3h44m
node-tuning 4.5.0-0.okd-2020-10-03-012432 True False False 3h44m
openshift-apiserver 4.5.0-0.okd-2020-10-03-012432 True False False 3h27m
openshift-controller-manager 4.5.0-0.okd-2020-10-03-012432 True False False 3h34m
openshift-samples 4.5.0-0.okd-2020-10-03-012432 True False False 3h24m
operator-lifecycle-manager 4.5.0-0.okd-2020-10-03-012432 True False False 3h43m
operator-lifecycle-manager-catalog 4.5.0-0.okd-2020-10-03-012432 True False False 3h43m
operator-lifecycle-manager-packageserver 4.5.0-0.okd-2020-10-03-012432 True False False 3h34m
service-ca 4.5.0-0.okd-2020-10-03-012432 True False False 3h44m
storage 4.5.0-0.okd-2020-10-03-012432 True False False 3h33m
4,為image-registry配置存儲
在非公有云平臺部署OKD4,image-registry沒有現成的存儲可用。在非生產環境,可以使用“emptyDir”來作為臨時存儲(重啟registry,鏡像會丟失,生產環境勿用),這樣就可以使用集群內的本地鏡像倉庫,配置命令如下,
oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"storage":{"emptyDir":{}}}}'
大功告成:
aneirin@host-1:~$ ./openshift-install --dir=config-install wait-for install-complete
INFO Waiting up to 30m0s for the cluster at https://api.okd-infra.wumi.ai:6443 to initialize...
INFO Waiting up to 10m0s for the openshift-console route to be created...
INFO Install complete!
INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/home/aneirin/okd4/config-install/auth/kubeconfig'
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.okd-infra.wumi.ai
INFO Login to the console with user: "kubeadmin", and password: "CaEJY-myzAi-R7Wtj-XXXX"
INFO Time elapsed: 1s
本篇只是OpenShift 4使用萬里長征的第一步,后面還有很多工作要做,比如monitoring、logging、storage等等,敬請期待!