一、前言
(一)、由來
主要是為了實現(xiàn)以下目的
- 獨立:不同用戶的環(huán)境相互獨立,不能互相訪問,且可以同時使用。(唯一通道是共享文件夾)
- 隔離:用戶是不可以訪問宿主機(jī)的(唯一通道是共享文件夾)。
- 自由:用戶可以像使用一臺自己的Linux機(jī)器一樣,方便訪問,擁有最大的權(quán)限,自有安裝程序、自由訪問網(wǎng)絡(luò)等等。(事實上,這種操作方式與購買GPU云主機(jī)的體驗基本等同)
- GPU:最最重要的是每位用戶都能使用GPU資源。
- 可控:雖說要自由,但必要時候(同時使用的人太多),還是要能夠限制每位用戶的資源的(CPU,內(nèi)存,GPU這些)
(二)、 環(huán)境
我用的環(huán)境都是最新的,大家覺得不穩(wěn)定的話,可以考慮往后退一點。
- kfctl v0.6.2-rc.2
- ksonnet version: 0.13.1
- k8s v1.15.3
- docker 19.03.1
- NVIDIA Docker: 2.0.3
機(jī)器有三臺,一臺拿來當(dāng)master, 另外兩臺有GPU,用來做node.
IP Address | Role | CPU | Memory | System | GPU |
---|---|---|---|---|---|
192.168.1.112 | master | 2 | 8 | Ubuntu 16.04 | none |
192.168.1.113 | node1 | 2 | 8 | Ubuntu 16.04 | 1070ti |
192.168.1.114 | node2 | 2 | 8 | Ubuntu 16.04 | 1070ti |
二、安裝k8s
(一).先安裝環(huán)境
保存下面的文件為setup.sh, 在三臺機(jī)子上都要執(zhí)行。
#!/bin/sh
#docker 倉庫
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
echo "deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial edge" | sudo tee /etc/apt/sources.list.d/docker.list
#k8s 倉庫
curl -s "https://packages.cloud.google.com/apt/doc/apt-key.gpg" | sudo apt-key add -
echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
# nvidia-docker2 倉庫
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
# cuda 倉庫
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-ubuntu1604.pin
sudo mv cuda-ubuntu1604.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
sudo add-apt-repository "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/ /"
#安裝nvidia驅(qū)動 master這段可以去掉,不去掉也行,反正安不上
wget http://cn.download.nvidia.com/XFree86/Linux-x86_64/430.40/NVIDIA-Linux-x86_64-430.40.run
chmod 777 NVIDIA-Linux-x86_64-430.40.run
./NVIDIA-Linux-x86_64-430.40.run
apt-get update
#開始安裝 cuda docker nvidia-docker2
apt-get -y install cuda docker nvidia-docker2 nfs-common
# 修改cgroups, 修改docker的runtime類型
echo "{"default-runtime": "nvidia","runtimes": {"nvidia": {"path": "/usr/bin/nvidia-container-runtime","runtimeArgs": []}},"exec-opts": ["native.cgroupdriver=systemd"]}">/etc/docker/daemon.json
systemctl restart docker
#安裝k8s全家桶
sudo swapoff -a
sudo apt-get install -y kubelet kubeadm kubectl
bash setup.sh
- 過程比較漫長,中間還涉及幾次交互。
- 大家注意一下,我的安裝環(huán)境是國外的,大家如果在國內(nèi),考慮把上述的一些google源替換掉。
- 執(zhí)行腳本的目錄下會多出幾個文件,腳本里沒刪掉,大家看看就好。
最后要驗證一下docker能夠調(diào)用到顯卡執(zhí)行docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
root@ubuntuNode1:/home/ubuntu# docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
Sat Aug 24 16:13:45 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 107... Off | 00000000:0B:00.0 Off | N/A |
| 24% 48C P8 7W / 180W | 0MiB / 8119MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
(二). 鏈接三臺機(jī)器成為集群
master上執(zhí)行下述代碼,初始化該機(jī)器為master
kubeadm init --service-cidr 10.96.0.0/12 --pod-network-cidr 10.244.0.0/16 \ #這里的ip盡量別動
--apiserver-advertise-address 192.168.1.112 \ #ip可以換成每個環(huán)境自己的
--ignore-preflight-errors=SystemVerification \ #k8s會驗證版本的,加了這一行能強(qiáng)制安裝未經(jīng)過驗證的版本
--token f6ncoz.loxuwn6pp5187ev4
執(zhí)行完上述這段代碼,會提示在其他node執(zhí)行kubeadm join操作。我這邊的如下:
kubeadm join 192.168.1.112:6443 --token f6ncoz.loxuwn6pp5187ev4 \
--discovery-token-ca-cert-hash sha256:eb0c7962fd8b2328ea38aa0f003186a8ace1c5af8b15dc1fa1e34054745a5bba \
--ignore-preflight-errors=SystemVerification #k8s會驗證版本的
在兩個node上都要執(zhí)行。
(三). 部署flannel插件
1.master上執(zhí)行下述代碼,插件就部署成功了。
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
2.創(chuàng)建配置文件
mkdir ~/.kube && cp /etc/kubernetes/admin.conf ~/.kube/config kubectl get node
3.驗證,都ready了就OK了。
kubectl get node
root@ubuntuCpu:/home/ubuntu# kubectl get node
NAME STATUS ROLES AGE VERSION
ubuntucpu Ready master 22h v1.15.3
ubuntunode1 Ready <none> 22h v1.15.3
ubuntunode2 Ready <none> 22h v1.15.3
root@ubuntuCpu:/home/ubuntu#
(四). 部署Nvidia-docker-plus插件
這個插件主要是用來讓k8s使用顯卡的,部署起來也簡單,在master上執(zhí)行下述代碼。
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.12/nvidia-device-plugin.yml
驗證 kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"
結(jié)果可以看到我的兩個node上各有一個GPU。
root@ubuntuCpu:/home/ubuntu# kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"
NAME GPU
ubuntucpu <none>
ubuntunode1 1
ubuntunode2 1
root@ubuntuCpu:/home/ubuntu#
三、部署kubeflow
(一). 概述
在筆者研究kubeflow這一個月里,kubeflow的教程更新了好多次,足以見kubeflow部署之復(fù)雜,問題之多。工具版本如下:
- kfctl v0.6.2-rc.2
- ksonnet version: 0.13.1
- k8s v1.15.3
- docker 19.03.1
- NVIDIA Docker: 2.0.3
筆者按照官方的教程,第一次成功部署,教程地址:https://www.kubeflow.org/docs/started/k8s/kfctl-existing-arrikto
(二).部署MetalLB
這些內(nèi)容默認(rèn)是被折疊的,但是其實很重要
1.部署插件
kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.8.1/manifests/metallb.yaml
2.MetalLB配置
這里大家要注意一下,下面的IP地址,為MetalLB分配的IP地址池,我是將該IP的范圍改成與集群同一網(wǎng)段的IP范圍,至少要有1個IP。
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
namespace: metallb-system
name: config
data:
config: |
address-pools:
- name: default
protocol: layer2
addresses:
- 10.0.0.100-10.0.0.110
EOF
3.驗證MetalLB的安裝,這個步驟筆者不贅述,大家看官方文檔。
(三).設(shè)置default StorageClass
1.搭建nfs服務(wù)器 參考文章
安裝(在哪一臺上都行,筆者是選擇在master安裝)
sudo apt install nfs-kernel-server
配置
sudo vi /etc/exports
添加下述內(nèi)容/home/nfs4 *(rw,sync,no_root_squash)
,這個配置了nfs的路徑,大家要確保這個路徑的存儲量足夠大,最好能到100G以上,因為后面的安裝每個PVC都是10G往上。最后文件的樣子:
# /etc/exports: the access control list for filesystems which may be exported
# to NFS clients. See exports(5).
#
# Example for NFSv2 and NFSv3:
# /srv/homes hostname1(rw,sync,no_subtree_check) hostname2(ro,sync,no_subtree_check)
#
# Example for NFSv4:
# /srv/nfs4 gss/krb5i(rw,sync,fsid=0,crossmnt,no_subtree_check)
# /srv/nfs4/homes gss/krb5i(rw,sync,no_subtree_check)
#
/home/nfs4 *(rw,sync,no_root_squash)
重啟服務(wù)
sudo /etc/init.d/nfs-kernel-server restart
2.創(chuàng)建測試的靜態(tài)PV和PVC 參考文章
首先,需要利用靜態(tài)的PV和PVC來測試一下NFS系統(tǒng)能否正常工作:
pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: mypv1
spec:
capacity:
storage: 4Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Recycle
nfs:
path: [已配置的NFS系統(tǒng)的路徑]
server: [已配置的NFS系統(tǒng)的IP地址]
pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: mypvc1
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Mi
執(zhí)行kubectl create -f pv.yaml
, kubectl create -f pvc.yaml
后,再執(zhí)行kubectl get pvc
看到bound了說明綁定了。
root@ubuntuCpu:/home/ubuntu# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
test-claim2 Bound pvc-db5c180d-8d9c-417e-a637-0de15d88f521 1Mi RWO nfs 10h
創(chuàng)建二者后,如果能夠自動綁定,說明NFS系統(tǒng)工作正常,這樣才能執(zhí)行下面的步驟。
3.運行nfs-client-provisioner
這一步有一個前提條件,就是所有的nodes上都要安裝nfs-common,因為docker是用nfs-common去掛載。
想要動態(tài)生成PV,需要運行一個NFS-Provisioner服務(wù),將已配置好的NFS系統(tǒng)相關(guān)參數(shù)錄入,并向用戶提供創(chuàng)建PV的服務(wù)。官方推薦使用Deployment運行一個replica來實現(xiàn),當(dāng)然也可以使用Daemonset等其他方式,這些都在官方文檔中提供了。
在創(chuàng)建Deployment之前,一定要按照官方文檔中的Step 3部分配置相關(guān)的內(nèi)容。(這部分是參考文章中的問文字,筆者是沒有進(jìn)行這個操作的~~)
編寫rbac.yaml文件如下:
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: nfs-provisioner-runner
rules:
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: run-nfs-provisioner
subjects:
- kind: ServiceAccount
name: nfs-provisioner
namespace: default
roleRef:
kind: ClusterRole
name: nfs-provisioner-runner
apiGroup: rbac.authorization.k8s.io
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: leader-locking-nfs-provisioner
rules:
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: leader-locking-nfs-provisioner
subjects:
- kind: ServiceAccount
name: nfs-provisioner
# replace with namespace where provisioner is deployed
namespace: default
roleRef:
kind: Role
name: leader-locking-nfs-provisioner
apiGroup: rbac.authorization.k8s.io
編寫serviceaccount.yaml文件如下:
apiVersion: v1
kind: ServiceAccount
metadata:
name: nfs-provisioner
注意,針對已配置好的NFS系統(tǒng)和自己搭建NFS系統(tǒng),Deployment中使用的鏡像不同!這里踩了一個大坑,在轉(zhuǎn)為使用現(xiàn)成系統(tǒng)后沒有修改原來的yaml文件中的鏡像,導(dǎo)致持續(xù)報錯,調(diào)試了好長時間才意識到問題。
編寫deployment.yaml文件如下:
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: nfs-provisioner
spec:
replicas: 1
strategy:
type: Recreate
template:
metadata:
labels:
app: nfs-provisioner
spec:
serviceAccount: nfs-provisioner
containers:
- name: nfs-provisioner
image: registry.cn-hangzhou.aliyuncs.com/open-ali/nfs-client-provisioner
volumeMounts:
- name: nfs-client-root
mountPath: /persistentvolumes
env:
- name: PROVISIONER_NAME
value: example.com/nfs
- name: NFS_SERVER
value: [已配置的NFS系統(tǒng)的IP地址]
- name: NFS_PATH
value: [已配置的NFS系統(tǒng)的掛載路徑]
volumes:
- name: nfs-client-root
nfs:
server: [已配置的NFS系統(tǒng)的IP地址]
path: [已配置的NFS系統(tǒng)的掛載路徑]
注意,官方文檔提供的鏡像在國內(nèi)無法正常下載,在網(wǎng)上找到了一個阿里云的鏡像作為替代。參考這里
這個鏡像中volume的mountPath默認(rèn)為/persistentvolumes,不能修改,否則運行時會報錯。
創(chuàng)建后觀察Pod能否正常運行。后面如果出現(xiàn)錯誤,可以用kubectl logs查看這個Pod的日志來查看錯誤,進(jìn)行調(diào)試。
4.創(chuàng)建StorageClass
編寫并創(chuàng)建storageclass.yaml如下:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: nfs
provisioner: example.com/nfs
給這個StorageClass標(biāo)記為默認(rèn),不然kubeflow不會調(diào)用這個StorageClass,執(zhí)行kubectl patch storageclass nfs -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
5.創(chuàng)建測試claim
接下來要創(chuàng)建測試的claim,以檢測StorageClass能否正常工作:
編寫并創(chuàng)建test-claim.yaml如下,注意storageClassName應(yīng)確保與上面創(chuàng)建的StorageClass名稱一致。
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: test-claim1
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Mi
storageClassName: nfs
創(chuàng)建后,用kubectl get pvc查看,觀察新創(chuàng)建的PVC能夠自動綁定PV。
(四)kubeflow部署
1.ksconnect
wget https://github.com/ksonnet/ksonnet/releases/download/v0.13.1/ks_0.13.1_linux_amd64.tar.gz
tar -xaf ks_0.13.1_linux_amd64.tar.gz
cp ks_0.13.1_linux_amd64/ks /usr/bin/ks
2.kfctl
wget https://github.com/kubeflow/kubeflow/releases/download/v0.6.2-rc.2/kfctl_v0.6.2-rc.2_linux.tar.gz
tar -xaf kfctl_v0.6.2-rc.2_linux.tar.gz
cp kfctl_v0.6.2-rc.2_linux/kfctl /usr/bin/kfctl
3.安裝kubeflow了
由于上面兩步我把kfctl 跟ks都拷貝進(jìn)bin里面了,這里官網(wǎng)的export PATH=$PATH:"<path to kfctl>"
我就忽略了。
export KFAPP="/home/ubuntu/kfapp"
export CONFIG="https://raw.githubusercontent.com/kubeflow/kubeflow/v0.6.1/bootstrap/config/kfctl_existing_arrikto.0.6.yaml"
# Specify credentials for the default user.
export KUBEFLOW_USER_EMAIL="admin@kubeflow.org"
export KUBEFLOW_PASSWORD="12341234"
kfctl init ${KFAPP} --config=${CONFIG} -V
cd ${KFAPP}
kfctl generate all -V
kfctl apply all -V
印象中kfctl apply all -V
的過程中,到最后的時候會提示說有一個namespace沒有創(chuàng)建,這個時候可以另外開一個命令行執(zhí)行kubectl create ns <namesapce name>
就可以了。
4.驗證
都在running就沒問題了,后續(xù)就是登錄kubeflow創(chuàng)建jupyterNotebook, 來驗證GPU的使用了。
root@ubuntuCpu:/home/ubuntu# kubectl get pods -n kubeflow
NAME READY STATUS RESTARTS AGE
admission-webhook-bootstrap-stateful-set-0 1/1 Running 0 11h
admission-webhook-deployment-b77bd65c5-dfhhd 1/1 Running 0 11h
argo-ui-6db54c878-df5pz 1/1 Running 0 11h
centraldashboard-54c456fc46-9s7nl 1/1 Running 0 11h
dex-6847f88df6-wzcsj 1/1 Running 0 11h
jupyter-web-app-deployment-6f96544f6f-pck6t 1/1 Running 0 11h
katib-controller-55ccdcc6c8-2nf5v 1/1 Running 0 11h
katib-db-b48df7777-wbf5s 1/1 Running 0 11h
katib-manager-6944b56f96-4hm45 1/1 Running 1 11h
katib-manager-rest-6f6b8f4b54-mzbjv 1/1 Running 0 11h
katib-suggestion-bayesianoptimization-66c6764d5b-7mwls 1/1 Running 0 11h
katib-suggestion-grid-5c758dbf4b-rfvjh 1/1 Running 0 11h
katib-suggestion-hyperband-76cdd95f46-q5ddx 1/1 Running 0 11h
katib-suggestion-nasrl-6bc7855ddd-wgklg 1/1 Running 0 11h
katib-suggestion-random-65c489b584-4shxg 1/1 Running 0 11h
katib-ui-57bcbb9f56-d67bf 1/1 Running 0 11h
metacontroller-0 1/1 Running 0 11h
metadata-db-8d9b95598-6x2tc 1/1 Running 0 11h
metadata-deployment-545d79c747-64777 1/1 Running 3 11h
metadata-deployment-545d79c747-kzj5q 1/1 Running 3 11h
metadata-deployment-545d79c747-x4sf6 1/1 Running 3 11h
metadata-ui-76b5498765-766sg 1/1 Running 0 11h
minio-56dc668bd-z7hrp 1/1 Running 0 11h
ml-pipeline-567b7d6b44-qklkn 1/1 Running 0 11h
ml-pipeline-persistenceagent-69f558486c-7lsbh 1/1 Running 0 11h
ml-pipeline-scheduledworkflow-869954f57c-v86km 1/1 Running 0 11h
ml-pipeline-ui-c8d7b55cc-tdhmq 1/1 Running 0 11h
ml-pipeline-viewer-controller-deployment-566d875695-5p85f 1/1 Running 0 11h
mysql-75654987c5-hbpml 1/1 Running 0 11h
notebook-controller-deployment-58c6c5d8cc-58z6q 1/1 Running 0 11h
profiles-deployment-84f47f6c9b-9k69z 2/2 Running 0 11h
pytorch-operator-69d875b748-tw42m 1/1 Running 0 11h
spartakus-volunteer-6cfc55fd88-tkdx5 1/1 Running 0 11h
tensorboard-5f685f9d79-48rw7 1/1 Running 0 11h
tf-job-dashboard-5fc794cc7c-z7vld 1/1 Running 0 11h
tf-job-operator-6c9674bcd8-tsrbq 1/1 Running 0 11h
workflow-controller-5b4764bc47-lkcht 1/1 Running 0 11h