K8S 底層網(wǎng)絡(luò)所需要解決的兩個(gè)問題
- 協(xié)助 k8s , 給每個(gè) NODE上的 docker 容器都分配互相不沖突的 IP
- 在這些 IP 地址之間簡(jiǎn)歷一個(gè)覆蓋網(wǎng)絡(luò)(overlay Network), 通過這個(gè)覆蓋網(wǎng)絡(luò), 將數(shù)據(jù)包原封不動(dòng)地傳遞到目標(biāo)容器內(nèi).
Open vSwitch
Open vSwitch 可以建立多鐘通信隧道, 例如Open vswitch with GRE/VXALN. 在K8S 場(chǎng)景下, 我們主要簡(jiǎn)歷 L3 到 L3 的隧道. 網(wǎng)絡(luò)架構(gòu)如下
需要完成的步驟如下:
刪除docker daemon 創(chuàng)建的網(wǎng)橋 docker0 已避免 docker0地址沖突.
手工創(chuàng)建一個(gè)linux網(wǎng)橋, 手動(dòng)配置網(wǎng)橋的 IP
建立 Open vswitch 網(wǎng)橋 ovs-bridge, 使用 ovs-vsctl 給ovs-bridge 添加gre端口, 在添加端口是, 需要將目標(biāo) NODE 的 IP 地址設(shè)置為對(duì)端 IP 地址. 每個(gè)對(duì)端 IP 地址都需要這么操作.
將 ovs-bridge 作為網(wǎng)絡(luò)接口, 加入docker 的網(wǎng)橋上(docker0 或自己手工創(chuàng)建的網(wǎng)橋)
重啟 ovs-bridge 網(wǎng)橋 和 docker 的網(wǎng)橋, 并添加一個(gè)docker 的網(wǎng)段到docker 網(wǎng)橋的路由規(guī)則中
網(wǎng)絡(luò)通信過程
當(dāng)容器內(nèi)的應(yīng)用訪問另一個(gè)容器地址時(shí), 數(shù)據(jù)包會(huì)通過容器內(nèi)的默認(rèn)路由發(fā)送給docker0網(wǎng)橋, ovs的網(wǎng)橋是作為docker0 網(wǎng)橋的端口存在的, 它會(huì)將數(shù)據(jù)發(fā)送給ovs 網(wǎng)橋, ovs 通過gre隧道 送達(dá)對(duì)端的node.
配置步驟
在兩個(gè)節(jié)點(diǎn)都安裝ovs
安裝ovs
yum install openvswitch
禁用selinux 并重啟
#vi /etc/selinux/conifg
SELINUX=disabled
查看ovs狀態(tài)
systemctl status openvswtich
創(chuàng)建網(wǎng)橋和 gre 隧道
在每個(gè)node上創(chuàng)建ovs 網(wǎng)橋 br0 然后在網(wǎng)橋上創(chuàng)建 gre 隧道
#創(chuàng)建ovs網(wǎng)橋
ovs-vsctl add-br br0
# 創(chuàng)建 GRE 隧道連接對(duì)端, remote_ip 為對(duì)端 eth0 的 IP, 注意在另一臺(tái)機(jī)器上設(shè)置對(duì)端的時(shí)候,IP 要改為當(dāng)前這臺(tái)機(jī)器的 IP
ovs-vsctl add-port br0 gre1 -- set interface gre1 type gre option:remote_ip=192.168.18.128
# 添加br0 到本地 docker0 網(wǎng)橋, 使得容器流量通過 ovs 進(jìn)入 tunnel
brctl addif docker0 br0
#啟動(dòng)br0 docker0 網(wǎng)橋
ip link set dev br0 up
ip link set dev docker0 up
由于128, 131 ip的兩臺(tái)機(jī)器docker0 網(wǎng)段分別是172.17.43.0/24, 172.17.42.0/24, 這兩個(gè)網(wǎng)段的路由都需要經(jīng)過本機(jī)docker0網(wǎng)橋.其中一個(gè)24網(wǎng)段通過ovs的gre 隧道到達(dá)對(duì)端. 因此需要在每個(gè)node上配置通過docker0 網(wǎng)橋的路由規(guī)則
ip route add 172.17.0.0/16 dev docker0
清空 docker 自帶的iptables 規(guī)則及l(fā)inux 的規(guī)則, 后者存在拒絕ICMP 保溫通過防火墻的規(guī)則
iptables -t nat -F; iptalbes -F
直接路由
網(wǎng)絡(luò)模型
在默認(rèn)情況下docker0
的IP 在node 網(wǎng)絡(luò)是沒法感知到的, 通過手工設(shè)置路由, 可以讓pod 在不同node 之間互通.
實(shí)現(xiàn)方式
通過部署multilayer switch (MLS) 來實(shí)現(xiàn)
假設(shè) POD1 所在的 docker0 網(wǎng)橋的 IP 網(wǎng)段是 10.1.10.0 , NODE1 地址為 192. 168.1.128; 而 POD2 所在 docker0 ip 網(wǎng)段為 10.1.20.0 NODE2 地址為 192.168.1.129
1 在NODE 1 上添加一條到node2 上 docker0 的靜態(tài)路由規(guī)則
route add -net 10.1.20.0 netmask 255.255.255.0 gw 192.168.1.129
2 在 NODE 2 上添加一條到 NODE 1 上 docker0 的靜態(tài)路由規(guī)則
route add -net 10.1.10.0 netmask 255.255.255.0 gw 192.168.1.128
3 驗(yàn)證連通性, 在 NODE1 上 ping node2 上的 docker0 網(wǎng)絡(luò)
ping 10.1.20.1
大規(guī)模集群下的實(shí)現(xiàn)方式, 手工建立 linux bridge 已避免 docker daemon 建立docker0 造成 IP 段沖突, 然后使用docker 的--bridge 命令來指定網(wǎng)橋.
然后在每個(gè)節(jié)點(diǎn)運(yùn)行quagga 路由學(xué)習(xí)軟件.
calico
Calico 工作方式
Calico可以創(chuàng)建并管理一個(gè)3層平面網(wǎng)絡(luò),為每個(gè)工作負(fù)載分配一個(gè)完全可路由的IP地址。 工作負(fù)載可以在沒有IP封裝或網(wǎng)絡(luò)地址轉(zhuǎn)換的情況下進(jìn)行通信,以實(shí)現(xiàn)裸機(jī)性能,簡(jiǎn)化故障排除和提供更好的互操作性。 在需要使用overlay網(wǎng)絡(luò)的環(huán)境中,Calico提供了IP-in-IP隧道技術(shù),或者也可以與flannel等其他overlay網(wǎng)絡(luò)配合使用。
Calico還提供網(wǎng)絡(luò)安全規(guī)則的動(dòng)態(tài)配置。 使用Calico的簡(jiǎn)單策略語言,就可以實(shí)現(xiàn)對(duì)容器、虛擬機(jī)工作負(fù)載和裸機(jī)主機(jī)各節(jié)點(diǎn)之間通信的細(xì)粒度控制。
Calico v3.4于2018.12.10號(hào)發(fā)布,可與Kubernetes、OpenShift和OpenStack良好地集成使用。
注意: 在Mesos, DC/OS和Docker orchestrators中使用Calico時(shí),目前只支持到了 Calico v2.6.
Calico的IPIP與BGP模式
- IPIP是一種將各Node的路由之間做一個(gè)tunnel,再把兩個(gè)網(wǎng)絡(luò)連接起來的模式。啟用IPIP模式時(shí),Calico將在各Node上創(chuàng)建一個(gè)名為”tunl0″的虛擬網(wǎng)絡(luò)接口。如下圖所示。
- BGP模式則直接使用物理機(jī)作為虛擬路由路(vRouter),不再創(chuàng)建額外的tunnel
calico 在linux 內(nèi)核中實(shí)現(xiàn)一個(gè)vRouter來負(fù)責(zé)數(shù)據(jù)轉(zhuǎn)發(fā), 通過 BGP 協(xié)議,將node 節(jié)點(diǎn)上的路由信息在整個(gè)calico 網(wǎng)絡(luò)中廣播, 并自動(dòng)設(shè)置到達(dá)其他節(jié)點(diǎn)的路由轉(zhuǎn)發(fā)規(guī)則.
Calico BGP模式在小規(guī)模集群中可以直接互聯(lián),在大規(guī)模集群中可以通過額外的BGP route reflector來完成。
Calico主要組件
Calico利用了Linux內(nèi)核原生的路由和iptables防火墻功能。 進(jìn)出各個(gè)容器、虛擬機(jī)和物理主機(jī)的所有流量都會(huì)在路由到目標(biāo)之前遍歷這些內(nèi)核規(guī)則。
- Felix:主要的Calico代理agent,運(yùn)行每臺(tái)計(jì)算機(jī)上管理endpoints資源。
- calicoctl:允許從命令行界面配置實(shí)現(xiàn)高級(jí)策略和網(wǎng)絡(luò)。
- orchestrator plugins:提供與各種流行的云計(jì)算編排工具的緊密集成和同步支持。
- key/value store:存儲(chǔ)Calico的策略配置和網(wǎng)絡(luò)狀態(tài)信息,目前主要使用etcdv3或k8s api。
- calico/node:在每個(gè)主機(jī)上運(yùn)行,從key/value存儲(chǔ)中讀取相關(guān)的策略和網(wǎng)絡(luò)配置信息,并在Linux內(nèi)核中實(shí)現(xiàn)它。
- Dikastes/Envoy:可選的Kubernetes sidecars,可以通過相互TLS身份驗(yàn)證保護(hù)工作負(fù)載到工作負(fù)載的通信,并增加應(yīng)用層控制策略。
Felix
Felix是一個(gè)守護(hù)程序,它在每個(gè)提供endpoints資源的計(jì)算機(jī)上運(yùn)行。在大多數(shù)情況下,這意味著它需要在托管容器或VM的宿主機(jī)節(jié)點(diǎn)上運(yùn)行。 Felix 負(fù)責(zé)編制路由和ACL規(guī)則以及在該主機(jī)上所需的任何其他內(nèi)容,以便為該主機(jī)上的endpoints資源正常運(yùn)行提供所需的網(wǎng)絡(luò)連接。
根據(jù)特定的編排環(huán)境,F(xiàn)elix負(fù)責(zé)以下任務(wù):
- 管理網(wǎng)絡(luò)接口,F(xiàn)elix將有關(guān)接口的一些信息編程到內(nèi)核中,以使內(nèi)核能夠正確處理該endpoint發(fā)出的流量。 特別是,它將確保主機(jī)正確響應(yīng)來自每個(gè)工作負(fù)載的ARP請(qǐng)求,并將為其管理的接口啟用IP轉(zhuǎn)發(fā)支持。它還監(jiān)視網(wǎng)絡(luò)接口的出現(xiàn)和消失,以便確保針對(duì)這些接口的編程得到了正確的應(yīng)用。
- 編寫路由,F(xiàn)elix負(fù)責(zé)將到其主機(jī)上endpoints的路由編寫到Linux內(nèi)核FIB(轉(zhuǎn)發(fā)信息庫)中。 這可以確保那些發(fā)往目標(biāo)主機(jī)的endpoints的數(shù)據(jù)包被正確地轉(zhuǎn)發(fā)。
- 編寫ACLs,F(xiàn)elix還負(fù)責(zé)將ACLs編程到Linux內(nèi)核中。 這些ACLs用于確保只能在endpoints之間發(fā)送有效的網(wǎng)絡(luò)流量,并確保endpoints無法繞過Calico的安全措施。
- 報(bào)告狀態(tài),F(xiàn)elix負(fù)責(zé)提供有關(guān)網(wǎng)絡(luò)健康狀況的數(shù)據(jù)。 特別是,它將報(bào)告配置其主機(jī)時(shí)發(fā)生的錯(cuò)誤和問題。 該數(shù)據(jù)會(huì)被寫入etcd,以使其對(duì)網(wǎng)絡(luò)中的其他組件和操作才可見。
Orchestrator Plugin
每個(gè)主要的云編排平臺(tái)都有單獨(dú)的Calico網(wǎng)絡(luò)插件(例如OpenStack,Kubernetes)。 這些插件的目的是將Calico更緊密地綁定到編排工具中,允許用戶管理Calico網(wǎng)絡(luò),就像他們管理編排工具中內(nèi)置的網(wǎng)絡(luò)工具一樣。
一個(gè)好的Orchestrator插件示例是Calico Neutron ML2 驅(qū)動(dòng)程序。 該插件與Neutron的ML2插件集成,允許用戶通過Neutron API調(diào)用來配置Calico網(wǎng)絡(luò),實(shí)現(xiàn)了與Neutron的無縫集成。
Orchestrator插件負(fù)責(zé)以下任務(wù):
- API Translation,每個(gè)云編排工具都不可避免地?fù)碛凶约旱囊惶子糜诠芾砭W(wǎng)絡(luò)的API接口規(guī)范, Orchestrator插件的主要工作就是將這些API轉(zhuǎn)換為Calico的數(shù)據(jù)模型,然后將其存儲(chǔ)在Calico的數(shù)據(jù)存儲(chǔ)區(qū)中。這種轉(zhuǎn)換中的一些工作將非常簡(jiǎn)單,其他一部分可能更復(fù)雜,以便將單個(gè)復(fù)雜操作(例如,實(shí)時(shí)遷移)轉(zhuǎn)換為Calico網(wǎng)絡(luò)期望的一系列更簡(jiǎn)單的操作。
- Feedback,如有需要,orchestrator插件將從Calico網(wǎng)絡(luò)向編排器提供管理命令的反饋信息。 包括提供有關(guān)Felix存活的信息,以及如果網(wǎng)絡(luò)配置失敗則將某些endpoints標(biāo)記為失敗。
etcd
etcd是一個(gè)分布式鍵值存儲(chǔ)數(shù)據(jù)庫,專注于實(shí)現(xiàn)數(shù)據(jù)存儲(chǔ)一致性。 Calico使用etcd提供組件之間的數(shù)據(jù)通信,并作為可以保證一致性的數(shù)據(jù)存儲(chǔ),以確保Calico始終可以構(gòu)建出一個(gè)準(zhǔn)確的網(wǎng)絡(luò)。
根據(jù)orchestrator插件的不同,etcd既可以是作為主數(shù)據(jù)存儲(chǔ)使用,也可以是一個(gè)單獨(dú)數(shù)據(jù)存儲(chǔ)的輕量級(jí)鏡像。例如,在OpenStack部署中,OpenStack數(shù)據(jù)庫被認(rèn)為是“真實(shí)配置信息的來源”,而etcd用于鏡像其中有關(guān)網(wǎng)絡(luò)配置的信息,并用于服務(wù)其他Calico組件。
etcd組件穿插在整個(gè)部署中。它可以被分為兩組主機(jī)節(jié)點(diǎn):核心集群和代理。
對(duì)于小型部署,核心集群可以是一個(gè)節(jié)點(diǎn)的etcd集群(通常與orchestrator插件組件位于同一節(jié)點(diǎn)上)。這種部署模型很簡(jiǎn)單但沒有為etcd提供冗余。在etcd失敗的情況下,orchstrator插件必須重建數(shù)據(jù)庫,例如OpenStack,它需要插件從OpenStack數(shù)據(jù)庫重新同步狀態(tài)到etcd。
在較大的部署中,核心群集可以根據(jù)etcd管理指南進(jìn)行擴(kuò)展。
此外,在運(yùn)行Felix或orchstrator插件的每臺(tái)計(jì)算機(jī)上,會(huì)運(yùn)行一個(gè)etcd代理服務(wù)。這減少了etcd核心集群上的負(fù)載,并為主機(jī)節(jié)點(diǎn)屏蔽了etcd服務(wù)集群的細(xì)節(jié)。在etcd集群與orchstrator插件在同一臺(tái)機(jī)器上都有成員的情況下,可以放棄在該機(jī)器上使用etcd代理。
etcd負(fù)責(zé)執(zhí)行以下任務(wù):
- Data Storage,etcd以分布式、一致和容錯(cuò)的方式存儲(chǔ)Calico網(wǎng)絡(luò)的數(shù)據(jù)(對(duì)于至少三個(gè)etcd節(jié)點(diǎn)的cluster大小)。 這確保Calico網(wǎng)絡(luò)始終處于已知良好狀態(tài),同時(shí)允許運(yùn)行etcd的個(gè)別機(jī)器節(jié)點(diǎn)失敗或無法訪問。Calico網(wǎng)絡(luò)數(shù)據(jù)的這種分布式存儲(chǔ)提高了Calico組件從數(shù)據(jù)庫讀取的能力。
- Communication,etcd也用作組件之間的通信服務(wù)。 我們通過讓非etcd組件監(jiān)視鍵值空間中的某些點(diǎn)來確保他們看到已經(jīng)做出的任何更改,從而允許他們及時(shí)響應(yīng)這些更改。 該功能允許將狀態(tài)信息提交到數(shù)據(jù)庫,然后觸發(fā)基于該狀態(tài)數(shù)據(jù)的進(jìn)一步網(wǎng)絡(luò)配置管理。
BGP Client (BIRD)
Calico在每個(gè)運(yùn)行Felix服務(wù)的節(jié)點(diǎn)上都部署一個(gè)BGP客戶端。 BGP客戶端的作用是讀取Felix程序編寫到內(nèi)核中并在數(shù)據(jù)中心內(nèi)分發(fā)的路由信息。
BGP客戶端負(fù)責(zé)執(zhí)行以下任務(wù):
- 路由信息分發(fā),當(dāng)Felix將路由插入Linux內(nèi)核FIB時(shí),BGP客戶端將接收它們并將它們分發(fā)到集群中的其他工作節(jié)點(diǎn)。
BGP Route Reflector (BIRD)
對(duì)于較大規(guī)模的部署,簡(jiǎn)單的BGP可能成為限制因素,因?yàn)樗竺總€(gè)BGP客戶端連接到網(wǎng)狀拓?fù)渲械拿恳粋€(gè)其他BGP客戶端。這需要越來越多的連接,迅速變得難以維護(hù),甚至?xí)屢恍┰O(shè)備的路由表撐滿。
因此,在較大規(guī)模的部署中,Calico建議部署B(yǎng)GP Route Reflector。通常是在Internet中使用這樣的組件充當(dāng)BGP客戶端連接的中心點(diǎn),從而防止它們需要與群集中的每個(gè)BGP客戶端進(jìn)行通信。為了實(shí)現(xiàn)冗余,也可以同時(shí)部署多個(gè)BGP Route Reflector服務(wù)。Route Reflector僅僅是協(xié)助管理BGP網(wǎng)絡(luò),并沒有endpoint數(shù)據(jù)會(huì)通過它們。
在Calico中,此BGP組件也是使用的最常見的BIRD,配置為Route Reflector運(yùn)行,而不是標(biāo)準(zhǔn)BGP客戶端。
BGP Route Reflector負(fù)責(zé)以下任務(wù):
- 集中式的路由信息分發(fā),當(dāng)Calico BGP客戶端將路由從其FIB通告到Route Reflector時(shí),Route Reflector會(huì)將這些路由通告給部署集群中的其他節(jié)點(diǎn)。
BIRD是什么
BIRD是布拉格查理大學(xué)數(shù)學(xué)與物理學(xué)院的一個(gè)學(xué)校項(xiàng)目,項(xiàng)目名是BIRD Internet Routing Daemon的縮寫。 目前,它由CZ.NIC實(shí)驗(yàn)室開發(fā)和支持。
BIRD項(xiàng)目旨在開發(fā)一個(gè)功能齊全的動(dòng)態(tài)IP路由守護(hù)進(jìn)程,主要針對(duì)(但不限于)Linux,F(xiàn)reeBSD和其他類UNIX系統(tǒng),并在GNU通用公共許可證下分發(fā)。詳細(xì)信息參照官網(wǎng)https://bird.network.cz/。
作為一個(gè)開源的網(wǎng)絡(luò)路由守護(hù)進(jìn)程項(xiàng)目,BRID設(shè)計(jì)并支持了以下功能:
- both IPv4 and IPv6 protocols
- multiple routing tables
- the Border Gateway Protocol (BGPv4)
- the Routing Information Protocol (RIPv2, RIPng)
- the Open Shortest Path First protocol (OSPFv2, OSPFv3)
- the Babel Routing Protocol
- the Router Advertisements for IPv6 hosts
- a virtual protocol for exchange of routes between different routing tables on a single host
- a command-line interface allowing on-line control and inspection of status of the daemon
- soft reconfiguration (no need to use complex online commands to change the configuration, just edit the configuration file and notify BIRD to re-read it and it will smoothly switch itself to the new configuration, not disturbing routing protocols unless they are affected by the configuration changes)
- a powerful language for route filtering
K8S 中部署 calico
-
修改kube-api server 啟動(dòng)參數(shù)
--allow-priviledge=true (calico 需要特權(quán)模式)
修改kubelet 啟動(dòng)參數(shù) --network-plugin=cni
假設(shè)K8S 環(huán)境包含兩個(gè)node節(jié)點(diǎn) node1 (192,168.18.3) , node2 (192.168.18.4)
創(chuàng)建calico 服務(wù), 主要包括calico-node 和 calico policy controller, 需要的K8S 資源對(duì)象如下
- configmap: calico-config 包含calico的配置參數(shù)
- secret: calico-etcd-secrets 用于TLS 連接etcd
- 在每個(gè)節(jié)點(diǎn)以daemonset的形式 部署calico/node 容器
- 在每個(gè)節(jié)點(diǎn)都安裝calico cni 二進(jìn)制文件和網(wǎng)絡(luò)配置參數(shù)(由install-cni 容器完成)
- 部署一個(gè)名為calico/kube-policy-controller的deployment, 為K8S 集群中的POD 設(shè)置network policy
官方 calico k8s 安裝 yaml 文件如下
calico-etcd.yaml
---
# Source: calico/templates/calico-etcd-secrets.yaml
# The following contains k8s Secrets for use with a TLS enabled etcd cluster.
# For information on populating Secrets, see http://kubernetes.io/docs/user-guide/secrets/
apiVersion: v1
kind: Secret
type: Opaque
metadata:
name: calico-etcd-secrets
namespace: kube-system
data:
# Populate the following with etcd TLS configuration if desired, but leave blank if
# not using TLS for etcd.
# The keys below should be uncommented and the values populated with the base64
# encoded contents of each file that would be associated with the TLS data.
# Example command for encoding a file contents: cat <file> | base64 -w 0
# etcd-key: null
# etcd-cert: null
# etcd-ca: null
---
# Source: calico/templates/calico-config.yaml
# This ConfigMap is used to configure a self-hosted Calico installation.
kind: ConfigMap
apiVersion: v1
metadata:
name: calico-config
namespace: kube-system
data:
# Configure this with the location of your etcd cluster.
#ETCD的服務(wù)地址
etcd_endpoints: "http://<ETCD_IP>:<ETCD_PORT>"
# If you're using TLS enabled etcd uncomment the following.
# You must also populate the Secret below with these files.
etcd_ca: "" # "/calico-secrets/etcd-ca"
etcd_cert: "" # "/calico-secrets/etcd-cert"
etcd_key: "" # "/calico-secrets/etcd-key"
# Typha is disabled.
typha_service_name: "none"
# Configure the backend to use.
calico_backend: "bird"
# Configure the MTU to use
veth_mtu: "1440"
# The CNI network configuration to install on each node. The special
# values in this config will be automatically populated.
cni_network_config: |-
{
"name": "k8s-pod-network",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "calico",
"log_level": "info",
"etcd_endpoints": "__ETCD_ENDPOINTS__",
"etcd_key_file": "__ETCD_KEY_FILE__",
"etcd_cert_file": "__ETCD_CERT_FILE__",
"etcd_ca_cert_file": "__ETCD_CA_CERT_FILE__",
"mtu": __CNI_MTU__,
"ipam": {
"type": "calico-ipam"
},
"policy": {
"type": "k8s"
},
"kubernetes": {
"kubeconfig": "__KUBECONFIG_FILEPATH__"
}
},
{
"type": "portmap",
"snat": true,
"capabilities": {"portMappings": true}
}
]
}
---
# Source: calico/templates/rbac.yaml
# Include a clusterrole for the kube-controllers component,
# and bind it to the calico-kube-controllers serviceaccount.
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: calico-kube-controllers
rules:
# Pods are monitored for changing labels.
# The node controller monitors Kubernetes nodes.
# Namespace and serviceaccount labels are used for policy.
- apiGroups: [""]
resources:
- pods
- nodes
- namespaces
- serviceaccounts
verbs:
- watch
- list
# Watch for changes to Kubernetes NetworkPolicies.
- apiGroups: ["networking.k8s.io"]
resources:
- networkpolicies
verbs:
- watch
- list
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: calico-kube-controllers
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: calico-kube-controllers
subjects:
- kind: ServiceAccount
name: calico-kube-controllers
namespace: kube-system
---
# Include a clusterrole for the calico-node DaemonSet,
# and bind it to the calico-node serviceaccount.
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: calico-node
rules:
# The CNI plugin needs to get pods, nodes, and namespaces.
- apiGroups: [""]
resources:
- pods
- nodes
- namespaces
verbs:
- get
- apiGroups: [""]
resources:
- endpoints
- services
verbs:
# Used to discover service IPs for advertisement.
- watch
- list
- apiGroups: [""]
resources:
- nodes/status
verbs:
# Needed for clearing NodeNetworkUnavailable flag.
- patch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: calico-node
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: calico-node
subjects:
- kind: ServiceAccount
name: calico-node
namespace: kube-system
---
# Source: calico/templates/calico-node.yaml
# This manifest installs the calico-node container, as well
# as the CNI plugins and network config on
# each master and worker node in a Kubernetes cluster.
kind: DaemonSet
apiVersion: apps/v1
metadata:
name: calico-node
namespace: kube-system
labels:
k8s-app: calico-node
spec:
selector:
matchLabels:
k8s-app: calico-node
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
template:
metadata:
labels:
k8s-app: calico-node
annotations:
# This, along with the CriticalAddonsOnly toleration below,
# marks the pod as a critical add-on, ensuring it gets
# priority scheduling and that its resources are reserved
# if it ever gets evicted.
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
nodeSelector:
beta.kubernetes.io/os: linux
hostNetwork: true
tolerations:
# Make sure calico-node gets scheduled on all nodes.
- effect: NoSchedule
operator: Exists
# Mark the pod as a critical add-on for rescheduling.
- key: CriticalAddonsOnly
operator: Exists
- effect: NoExecute
operator: Exists
serviceAccountName: calico-node
# Minimize downtime during a rolling upgrade or deletion; tell Kubernetes to do a "force
# deletion": https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods.
terminationGracePeriodSeconds: 0
priorityClassName: system-node-critical
initContainers:
# This container installs the CNI binaries
# and CNI network config file on each node.
- name: install-cni
image: calico/cni:v3.8.0
command: ["/install-cni.sh"]
env:
# Name of the CNI config file to create.
- name: CNI_CONF_NAME
value: "10-calico.conflist"
# The CNI network config to install on each node.
- name: CNI_NETWORK_CONFIG
valueFrom:
configMapKeyRef:
name: calico-config
key: cni_network_config
# The location of the etcd cluster.
- name: ETCD_ENDPOINTS
valueFrom:
configMapKeyRef:
name: calico-config
key: etcd_endpoints
# CNI MTU Config variable
- name: CNI_MTU
valueFrom:
configMapKeyRef:
name: calico-config
key: veth_mtu
# Prevents the container from sleeping forever.
- name: SLEEP
value: "false"
volumeMounts:
- mountPath: /host/opt/cni/bin
name: cni-bin-dir
- mountPath: /host/etc/cni/net.d
name: cni-net-dir
- mountPath: /calico-secrets
name: etcd-certs
# Adds a Flex Volume Driver that creates a per-pod Unix Domain Socket to allow Dikastes
# to communicate with Felix over the Policy Sync API.
- name: flexvol-driver
image: calico/pod2daemon-flexvol:v3.8.0
volumeMounts:
- name: flexvol-driver-host
mountPath: /host/driver
containers:
# Runs calico-node container on each Kubernetes node. This
# container programs network policy and routes on each
# host.
- name: calico-node
image: calico/node:v3.8.0
env:
# The location of the etcd cluster.
- name: ETCD_ENDPOINTS
valueFrom:
configMapKeyRef:
name: calico-config
key: etcd_endpoints
# Location of the CA certificate for etcd.
- name: ETCD_CA_CERT_FILE
valueFrom:
configMapKeyRef:
name: calico-config
key: etcd_ca
# Location of the client key for etcd.
- name: ETCD_KEY_FILE
valueFrom:
configMapKeyRef:
name: calico-config
key: etcd_key
# Location of the client certificate for etcd.
- name: ETCD_CERT_FILE
valueFrom:
configMapKeyRef:
name: calico-config
key: etcd_cert
# Set noderef for node controller.
- name: CALICO_K8S_NODE_REF
valueFrom:
fieldRef:
fieldPath: spec.nodeName
# Choose the backend to use.
- name: CALICO_NETWORKING_BACKEND
valueFrom:
configMapKeyRef:
name: calico-config
key: calico_backend
# Cluster type to identify the deployment type
- name: CLUSTER_TYPE
value: "k8s,bgp"
# Auto-detect the BGP IP address.
- name: IP
value: "autodetect"
# Enable IPIP
- name: CALICO_IPV4POOL_IPIP
value: "Always"
# Set MTU for tunnel device used if ipip is enabled
- name: FELIX_IPINIPMTU
valueFrom:
configMapKeyRef:
name: calico-config
key: veth_mtu
# The default IPv4 pool to create on startup if none exists. Pod IPs will be
# chosen from this range. Changing this value after installation will have
# no effect. This should fall within `--cluster-cidr`.
- name: CALICO_IPV4POOL_CIDR
value: "192.168.0.0/16"
# Disable file logging so `kubectl logs` works.
- name: CALICO_DISABLE_FILE_LOGGING
value: "true"
# Set Felix endpoint to host default action to ACCEPT.
- name: FELIX_DEFAULTENDPOINTTOHOSTACTION
value: "ACCEPT"
# Disable IPv6 on Kubernetes.
- name: FELIX_IPV6SUPPORT
value: "false"
# Set Felix logging to "info"
- name: FELIX_LOGSEVERITYSCREEN
value: "info"
- name: FELIX_HEALTHENABLED
value: "true"
securityContext:
privileged: true
resources:
requests:
cpu: 250m
livenessProbe:
httpGet:
path: /liveness
port: 9099
host: localhost
periodSeconds: 10
initialDelaySeconds: 10
failureThreshold: 6
readinessProbe:
exec:
command:
- /bin/calico-node
- -bird-ready
- -felix-ready
periodSeconds: 10
volumeMounts:
- mountPath: /lib/modules
name: lib-modules
readOnly: true
- mountPath: /run/xtables.lock
name: xtables-lock
readOnly: false
- mountPath: /var/run/calico
name: var-run-calico
readOnly: false
- mountPath: /var/lib/calico
name: var-lib-calico
readOnly: false
- mountPath: /calico-secrets
name: etcd-certs
- name: policysync
mountPath: /var/run/nodeagent
volumes:
# Used by calico-node.
- name: lib-modules
hostPath:
path: /lib/modules
- name: var-run-calico
hostPath:
path: /var/run/calico
- name: var-lib-calico
hostPath:
path: /var/lib/calico
- name: xtables-lock
hostPath:
path: /run/xtables.lock
type: FileOrCreate
# Used to install CNI.
- name: cni-bin-dir
hostPath:
path: /opt/cni/bin
- name: cni-net-dir
hostPath:
path: /etc/cni/net.d
# Mount in the etcd TLS secrets with mode 400.
# See https://kubernetes.io/docs/concepts/configuration/secret/
- name: etcd-certs
secret:
secretName: calico-etcd-secrets
defaultMode: 0400
# Used to create per-pod Unix Domain Sockets
- name: policysync
hostPath:
type: DirectoryOrCreate
path: /var/run/nodeagent
# Used to install Flex Volume Driver
- name: flexvol-driver-host
hostPath:
type: DirectoryOrCreate
path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: calico-node
namespace: kube-system
---
# Source: calico/templates/calico-kube-controllers.yaml
# See https://github.com/projectcalico/kube-controllers
apiVersion: apps/v1
kind: Deployment
metadata:
name: calico-kube-controllers
namespace: kube-system
labels:
k8s-app: calico-kube-controllers
spec:
# The controllers can only have a single active instance.
replicas: 1
selector:
matchLabels:
k8s-app: calico-kube-controllers
strategy:
type: Recreate
template:
metadata:
name: calico-kube-controllers
namespace: kube-system
labels:
k8s-app: calico-kube-controllers
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
nodeSelector:
beta.kubernetes.io/os: linux
tolerations:
# Mark the pod as a critical add-on for rescheduling.
- key: CriticalAddonsOnly
operator: Exists
- key: node-role.kubernetes.io/master
effect: NoSchedule
serviceAccountName: calico-kube-controllers
priorityClassName: system-cluster-critical
# The controllers must run in the host network namespace so that
# it isn't governed by policy that would prevent it from working.
hostNetwork: true
containers:
- name: calico-kube-controllers
image: calico/kube-controllers:v3.8.0
env:
# The location of the etcd cluster.
- name: ETCD_ENDPOINTS
valueFrom:
configMapKeyRef:
name: calico-config
key: etcd_endpoints
# Location of the CA certificate for etcd.
- name: ETCD_CA_CERT_FILE
valueFrom:
configMapKeyRef:
name: calico-config
key: etcd_ca
# Location of the client key for etcd.
- name: ETCD_KEY_FILE
valueFrom:
configMapKeyRef:
name: calico-config
key: etcd_key
# Location of the client certificate for etcd.
- name: ETCD_CERT_FILE
valueFrom:
configMapKeyRef:
name: calico-config
key: etcd_cert
# Choose which controllers to run.
- name: ENABLED_CONTROLLERS
value: policy,namespace,serviceaccount,workloadendpoint,node
volumeMounts:
# Mount in the etcd TLS secrets.
- mountPath: /calico-secrets
name: etcd-certs
readinessProbe:
exec:
command:
- /usr/bin/check-status
- -r
volumes:
# Mount in the etcd TLS secrets with mode 400.
# See https://kubernetes.io/docs/concepts/configuration/secret/
- name: etcd-certs
secret:
secretName: calico-etcd-secrets
defaultMode: 0400
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: calico-kube-controllers
namespace: kube-system
---
# Source: calico/templates/calico-typha.yaml
---
# Source: calico/templates/configure-canal.yaml
---
# Source: calico/templates/kdd-crds.yaml
kubectl apply -f calico-etcd.yaml
注意修改參數(shù)
coredns
啟用 coredns 需要在kubelet 上添加兩個(gè)參數(shù)
--cluster-dns=169.169.0.100 IP 為DNS服務(wù)的cluster ip
--cluster-domain=cluster.local 為dns服務(wù)設(shè)置的域名
需要部署的coredns yaml文件
apiVersion: v1
kind: ServiceAccount
metadata:
name: coredns
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: system:coredns
rules:
- apiGroups:
- ""
resources:
- endpoints
- services
- pods
- namespaces
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: system:coredns
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:coredns
subjects:
- kind: ServiceAccount
name: coredns
namespace: kube-system
---
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
health
ready
kubernetes CLUSTER_DOMAIN REVERSE_CIDRS {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}FEDERATIONS
prometheus :9153
forward . UPSTREAMNAMESERVER
cache 30
loop
reload
loadbalance
}STUBDOMAINS
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: coredns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/name: "CoreDNS"
spec:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
selector:
matchLabels:
k8s-app: kube-dns
template:
metadata:
labels:
k8s-app: kube-dns
spec:
priorityClassName: system-cluster-critical
serviceAccountName: coredns
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
nodeSelector:
beta.kubernetes.io/os: linux
containers:
- name: coredns
image: coredns/coredns:1.5.0
imagePullPolicy: IfNotPresent
resources:
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi
args: [ "-conf", "/etc/coredns/Corefile" ]
volumeMounts:
- name: config-volume
mountPath: /etc/coredns
readOnly: true
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
- containerPort: 9153
name: metrics
protocol: TCP
securityContext:
allowPrivilegeEscalation: false
capabilities:
add:
- NET_BIND_SERVICE
drop:
- all
readOnlyRootFilesystem: true
livenessProbe:
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /ready
port: 8181
scheme: HTTP
dnsPolicy: Default
volumes:
- name: config-volume
configMap:
name: coredns
items:
- key: Corefile
path: Corefile
---
apiVersion: v1
kind: Service
metadata:
name: kube-dns
namespace: kube-system
annotations:
prometheus.io/port: "9153"
prometheus.io/scrape: "true"
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "CoreDNS"
spec:
selector:
k8s-app: kube-dns
#這里要和kubelet 參數(shù)對(duì)應(yīng)上
clusterIP: 169.169.0.100
ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
- name: metrics
port: 9153
protocol: TCP
如果想知道現(xiàn)有集群中使用的配置項(xiàng)可以使用如下命令進(jìn)行查看
kubectl -n kube-system get configmap coredns -o yaml
如果文章對(duì)您有幫助,請(qǐng)點(diǎn)一下下面的 "喜歡"