說明
loki 支持文件系統、對象存儲、NoSQL,因為對象存儲大多都要使用公有云,所以暫時使用 Cassandra 作為存儲,目前的實現里它支持 index 和 chunk
基本知識
Cassandra 是一個開源的、分布式、無中心節點、彈性可擴展、高可用、容錯、一致性協調、面向列的 NoSQL 數據庫
架構層級:
Cluster - Data center(s) - Rack(s) - Server(s) - Node (more accurately, a vnode)
Node(節點): 一個運行 cassandra 的實例
Rack(機架): 一組 nodes 的集合
DataCenter(數據中心):一組 racks 的集合
Cluster(集群): 映射到擁有一個完整令牌圓環所有節點的集合
Cassandra 一致性
參考:https://cassandra.apache.org/doc/latest/architecture/dynamo.html
Apache Cassandra 依賴于 Amazon Dynamo 分布式存儲鍵值系統的多種技術。Dynamo 系統中的每個節點都有三個主要組件:
- 請求對分區數據集進行協調,客戶端連接到某一節點發起讀寫請求時,該節點充當客戶端應用與集群中擁有相應數據節點間的橋梁,稱為協調者,以根據集群配置確定環(ring)中的哪個節點應當獲取這個請求
- 環成員(Ring membership)和故障檢測
- 本地持久性(存儲)引擎
復制策略
注意:所有生產部署都應使用 NetworkTopologyStrategy,而 SimpleStrategy 復制策略僅對尚不知道集群數據中心布局的集群進行測試有用
- NetworkTopologyStrategy
- SimpleStrategy
一致性級別
ONE 只有單個副本必須響應
TWO 兩個副本必須響應
THREE 三個副本必須響應
QUORUM 多數副本( n / 2 + 1)必須響應
ALL 所有副本都必須響應
LOCAL_QUORUM 本地數據中心(協調器所在的任何數據中心)中的大多數副本都必須響應
EACH_QUORUM 每個數據中心中的大多數副本必須響應
LOCAL_ONE 只有單個副本必須響應。在多數據中心群集中,這也保證讀取請求不會發送到遠程數據中心中的副本
ANY 單個副本可以響應,或者協調器可以存儲提示。如果存儲了提示,則協調器稍后將嘗試重播提示并將突變傳遞給副本。僅寫操作接受此一致性級別。
復制因子和一致性級別
Cassandra 提供了可調節的一致性,允許我們選定需要的一致性水平與可用性水平,在二者間找到平衡點。因為客戶端可以控制在更新到達多少個副本之前,必須阻塞系統。這是通過設置副本因子(replication factor)來調節與之相對的一致性級別。
通過副本因子(replication factor),你可以決定準備犧牲多少性能來換取一致性。副本因子是你要求更新在集群中傳播到的節點數(注意,更新包括所有增加、刪除和更新操作)。
客戶端每次操作還必須設置一個一致性級別(consistency level)參數,這個參數決定了多少個副本寫入成功才可以認定寫操作是成功的,或者讀取過程中讀到多少個副本正確就可以認定是讀成功的。
如果需要的話,你可以設定一致性級別和副本因子相等,從而達到一個較高的一致性水平,不過這樣就必須付出同步阻塞操作的代價,只有所有節點都被更新完成才能成功返回一次更新。而實際上,Cassandra 一般都不會這么來用。而如果一個客戶端設置一致性級別低于副本因子的話,即使有節點宕機了,仍然可以寫成功。
安裝
下載包:
# wget -O cassandra-operator-v7.1.0.tar.gz https://github.com/instaclustr/cassandra-operator/archive/v7.1.0.tar.gz
# tar -zxf cassandra-operator-v7.1.0.tar.gz
# cd cassandra-operator-7.1.0/
部署 CRD:
參考:https://github.com/instaclustr/cassandra-operator/wiki/Installation-and-deployment
# kubectl apply -f deploy/crds.yaml
customresourcedefinition.apiextensions.k8s.io/cassandrabackups.cassandraoperator.instaclustr.com created
customresourcedefinition.apiextensions.k8s.io/cassandraclusters.cassandraoperator.instaclustr.com created
customresourcedefinition.apiextensions.k8s.io/cassandradatacenters.cassandraoperator.instaclustr.com created
部署 Cassandra Operator:
# vi deploy/bundle.yaml
# 對里面的資源都加上 namespace: grafana,因為我這里需要和 loki 配合使用。podsecuritypolicy 是集群級別的,可以不用加
containers:
- name: cassandra-operator
#image: "gcr.io/cassandra-operator/cassandra-operator:latest"
image: "ops-harbor.hupu.io/k8s/cassandra-operator:v7.1.0"
# kubectl apply -f deploy/bundle.yaml
serviceaccount/cassandra created
role.rbac.authorization.k8s.io/cassandra created
rolebinding.rbac.authorization.k8s.io/cassandra created
podsecuritypolicy.policy/cassandra created
serviceaccount/cassandra-performance created
role.rbac.authorization.k8s.io/cassandra-performance created
rolebinding.rbac.authorization.k8s.io/cassandra-performance created
podsecuritypolicy.policy/cassandra-performance created
configmap/cassandra-operator-default-config created
deployment.apps/cassandra-operator created
podsecuritypolicy.policy/cassandra-operator created
rolebinding.rbac.authorization.k8s.io/cassandra-operator created
role.rbac.authorization.k8s.io/cassandra-operator created
serviceaccount/cassandra-operator created
# kubectl get pod -n grafana -l name=cassandra-operator
NAME READY STATUS RESTARTS AGE
cassandra-operator-6f685694c5-l7m27 1/1 Running 0 40s
查看日志:
# kubectl logs -f -n grafana $(kubectl get pod -n grafana -l name=cassandra-operator -o name)
部署 Cassandra Cluster:
參考:https://github.com/instaclustr/cassandra-operator/wiki/Custom-configuration
參考:https://cassandra.apache.org/doc/latest/configuration/index.html
cassandra operator 支持將自定義 ConfigMap 通過 ConfigMapVolumeSource 裝載到 cassandra 容器中:
注意:所有 Cassandra 和 JVM 配置都存在于容器內的 /etc/Cassandra 下
$ ls -l /etc/cassandra/
total 48
-rw-r--r-- 1 cassandra cassandra 19 Nov 9 13:29 cassandra-env.sh
drwxr-xr-x 2 cassandra cassandra 4096 Nov 28 05:39 cassandra-env.sh.d
-rw-r--r-- 1 cassandra cassandra 19 Nov 9 13:29 cassandra-exporter.conf
-rw-r--r-- 1 cassandra cassandra 70 Nov 28 05:39 cassandra-rackdc.properties
drwxr-xr-x 2 cassandra cassandra 4096 Nov 28 05:39 cassandra.yaml.d
-rw-r--r-- 1 cassandra cassandra 82 Nov 9 13:29 jvm-jmx.options
-rw-r--r-- 1 cassandra cassandra 143 Nov 9 13:29 jvm-operator.options
-rw-r--r-- 1 cassandra cassandra 600 Nov 9 13:29 jvm.options
drwxr-xr-x 2 cassandra cassandra 4096 Nov 28 05:39 jvm.options.d
-rw-r--r-- 1 cassandra cassandra 1239 Nov 9 13:29 logback-tools.xml
-rw-r--r-- 1 cassandra cassandra 538 Nov 9 13:29 logback.xml
drwxr-xr-x 2 cassandra cassandra 4096 Nov 9 13:31 logback.xml.d
$ ls -l /etc/cassandra/cassandra-env.sh.d
total 4
-rw-r--r-- 1 cassandra cassandra 130 Nov 28 05:39 001-cassandra-exporter.sh
$ ls -l /etc/cassandra/cassandra.yaml.d
total 16
-rw-r--r-- 1 cassandra cassandra 187 Nov 9 13:29 001-directories.yaml
-rw-r--r-- 1 cassandra cassandra 404 Nov 28 05:39 001-operator-overrides.yaml
-rw-r--r-- 1 cassandra cassandra 59 Nov 28 05:39 004-broadcast_rpc_address.yaml
-rw-r--r-- 1 cassandra cassandra 29 Nov 28 05:39 cassandra-config.yaml
$ ls -l /etc/cassandra/jvm.options.d
total 4
-rw-r--r-- 1 cassandra cassandra 416 Nov 28 05:39 001-jvm-memory-gc.options
/$ ls -l /etc/cassandra/logback.xml.d
total 0
自定義 Cassandra 配置:
# vi cassandra-config.yaml
# 空閑連接超時,默認為禁用
#native_transport_idle_timeout_in_ms: 60000
# 協調器應等待讀取操作完成的時間
read_request_timeout_in_ms: 30000
# 協調器應等待 seq 或 index 掃描完成的時間
range_request_timeout_in_ms: 30000
# 協調器應等待多長時間才能完成寫操作
write_request_timeout_in_ms: 30000
# 協調器應等待多長時間才能完成計數器寫入
counter_write_request_timeout_in_ms: 30000
# 協調器應繼續重試與同一行的其他提議相抵觸的 CAS 操作的時間
cas_contention_timeout_in_ms: 30000
# 協調器應等待多長時間才能完成截斷操作(這可能會更長,因為除非禁用了auto_snapshot,否則我們需要先刷新,以便可以在刪除數據之前進行快照。)
truncate_request_timeout_in_ms: 60000
# 其他雜項操作的默認超時
request_timeout_in_ms: 30000
# 記錄慢速查詢的時間
slow_query_log_timeout_in_ms: 5000
# 默認情況下,此選項已被注釋掉,啟用節點之間的操作超時信息交換,以準確測量請求超時。如果禁用,副本將假定協調員將請求立即轉發給他們,這意味著在過載情況下,我們將浪費大量額外的時間來處理已超時的請求。
# 警告:通常假定用戶在其集群上設置了 NTP,并且時鐘適度同步,因為這是最后寫入獲勝者的總體正確性的要求。
#cross_node_timeout: true
# 默認情況下,此選項已被注釋掉。
#internode_application_send_queue_capacity_in_bytes: 4194304
# 用于穩定塊緩存和緩沖池的最大內存。其中的 32MB 保留用于緩沖池,其余部分用作保存未壓縮的穩定塊的緩存。默認為堆的 1/4 或 512MB 中的較小者。該池是堆外分配的,因此除了分配給堆的內存之外。緩存還有堆內存開銷,每個塊大約 128 字節(如果使用默認的 64k 塊大小,則為保留大小的 0.2%)。僅在需要時才分配內存。
file_cache_size_in_mb: 2048
# kubectl create configmap cassandra-new --from-file=cassandra-config.yaml -n grafana
方法一:
# vi cassandra-operator-7.1.0/examples/example-datacenter.yaml
spec:
userConfigMapVolumeSource:
# the name of the ConfigMap
name: cassandra-new
# ConfigMap keys -> file paths (relative to /etc/cassandra)
items:
- key: cassandra-config.yaml
path: cassandra.yaml.d/cassandra-config.yaml
方法二:
# kubectl edit CassandraDataCenter -n grafana cassandra-dc1
spec:
userConfigMapVolumeSource:
# the name of the ConfigMap
name: cassandra-new
# ConfigMap keys -> file paths (relative to /etc/cassandra)
items:
- key: cassandra-config.yaml
path: cassandra.yaml.d/cassandra-config.yaml
注意:也可以自定義 JVM 選項,通過在 jvm.options.d/gc.options 下指定一個選項文件來提供
默認的 JVM 選項:
# kubectl get cm -n grafana cassandra-cassandra-dc1-dc1-operator-config -o yaml
data:
jvm_options_d_001_jvm_memory_gc_options: |
-Xms1073741824
-Xmx1073741824
-Xmn4194304
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSWaitDuration=10000
-XX:+CMSParallelInitialMarkEnabled
-XX:+CMSEdenChunksRecordAlways
-XX:+CMSClassUnloadingEnabled
-XX:+HeapDumpOnOutOfMemoryError
-XX:+CrashOnOutOfMemoryError
注意:這會全局覆蓋默認的,而不是額外添加
# vi gc.options
-XX:+UseG1GC
-XX:ParallelGCThreads=8
-XX:MaxGCPauseMillis=200
#-Xms8g
#-Xmx8g
#-Xmn4g
-XX:+UseContainerSupport
#-XX:InitialRAMPercentage=15.0
#-XX:MinRAMPercentage=15.0
#-XX:MaxRAMPercentage=75.0
# kubectl delete configmap cassandra-new -n grafana
# kubectl create configmap cassandra-new --from-file=gc.options --from-file=cassandra-config.yaml -n grafana
# vi cassandra-operator-7.1.0/examples/example-datacenter.yaml
spec:
userConfigMapVolumeSource:
# the name of the ConfigMap
name: cassandra-new
# ConfigMap keys -> file paths (relative to /etc/cassandra)
items:
- key: cassandra-config.yaml
path: cassandra.yaml.d/cassandra-config.yaml
- key: gc.options
path: jvm.options.d/gc.options
修改 CassandraDataCenter 示例:
# vi examples/example-datacenter.yaml
apiVersion: cassandraoperator.instaclustr.com/v1alpha1
kind: CassandraDataCenter
metadata:
name: cassandra-dc1
namespace: grafana
labels:
app: cassandra
datacenter: dc1
cluster: cassandra-dc1
spec:
initImage: ops-harbor.hupu.io/base/alpine:v3.10
# 通過 cassandra-operator-metrics ServiceMonitor 來實現,鏡像中內置了 cassandra-exporter
prometheusSupport: true
optimizeKernelParams: true
serviceAccountName: cassandra-performance
nodes: 3
racks:
- name: rack1
# 實際效果為 nodeSelector
labels:
failure-domain.beta.kubernetes.io/zone: cn-hangzhou-g
# 容忍
tolerations:
- key: "app"
operator: "Equal"
value: "cassandra"
effect: "NoSchedule"
# 親和性
affinity:
# Pod 反親和
podAntiAffinity:
# Pod 硬反親和
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- cassandra
topologyKey: "kubernetes.io/hostname"
# Pod 軟反親和
#preferredDuringSchedulingIgnoredDuringExecution:
#- podAffinityTerm:
# labelSelector:
# matchExpressions:
# - key: app
# operator: In
# values:
# - cassandra
# topologyKey: kubernetes.io/hostname
# weight: 100
# 節點親和性
#nodeAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: system
# operator: NotIn
# values:
# - management
# - key: app
# operator: In
# values:
# - cassandra
# #preferredDuringSchedulingIgnoredDuringExecution:
# #- weight: 60
# # preference:
# # matchExpressions:
# # - {key: zone, operator: In, values: ["shanghai2", "shanghai3", "shanghai4"]}
# #- weight: 40
# # preference:
# # matchFields:
# # - {key: ssd, operator: Exists, values: ["sanxing", "dongzhi"]}
#racks:
# - name: "west1-b"
# labels:
# failure-domain.beta.kubernetes.io/zone: europe-west1-b
# - name: "west1-c"
# labels:
# failure-domain.beta.kubernetes.io/zone: europe-west1-c
# - name: "west1-a"
# labels:
# failure-domain.beta.kubernetes.io/zone: europe-west1-a
#cassandraImage: "gcr.io/cassandra-operator/cassandra-3.11.6:latest"
cassandraImage: "ops-harbor.hupu.io/k8s/cassandra-3.11.9:latest"
#sidecarImage: "gcr.io/cassandra-operator/instaclustr-icarus:latest"
sidecarImage: "ops-harbor.hupu.io/k8s/instaclustr-icarus:latest"
imagePullPolicy: Always
imagePullSecrets:
- name: regcred
# 不生效,官方已經注釋掉了該字段:https://github.com/liwang0513/cassandra-operator/commit/b4f8b596013e5cbeaf222957b5aaa9b52a91efd7
podManagementPolicy: Parallel
# 用于存放 loki 日志時,發現經常超過 2GB 導致 OOM,平均內存在 4GB 左右,CPU 1.5 核心左右
readinessProbe:
exec:
command:
- /usr/bin/cql-readiness-probe
failureThreshold: 3
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
memory: 32Gi
cpu: "16"
requests:
memory: 4Gi
cpu: "2"
sidecarResources:
limits:
memory: 512Mi
requests:
memory: 512Mi
dataVolumeClaimSpec:
storageClassName: alicloud-disk-efficiency-cn-hangzhou-g
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2048Gi
# 將自定義 ConfigMap 通過 ConfigMapVolumeSource 裝載到 cassandra 容器中達到修改 Cassandra.yaml 參數目的
userConfigMapVolumeSource:
# the name of the ConfigMap
name: cassandra-new
type: array
# ConfigMap keys -> file paths (relative to /etc/cassandra)
items:
- key: cassandra-config.yaml
path: cassandra.yaml.d/cassandra-config.yaml
- key: gc.options
path: jvm.options.d/gc.options
# userSecretVolumeSource:
# secretName: test-cassandra-dc-ssl
#
# sidecarSecretVolumeSource:
# secretName: test-cassandra-dc-ssl-sidecar
cassandraAuth:
authenticator: PasswordAuthenticator
authorizer: CassandraAuthorizer
roleManager: CassandraRoleManager
# operatorLabels:
# prometheusService:
# cassandratestdclabel: testdc
# nodesService:
# mynodesservicelabel: labelvalue1
# statefulSet:
# mystatefullabel: labelvalue2
# podTemplate:
# mypodlabel: label1
# myanotherpod: label2
#
# operatorAnnotations:
# prometheusService:
# p1 : pv1
# nodesService:
# n1: nv1
# n2: nv2
# statefulSet:
# s1: sv1
# s2: sv2
# podTemplate:
# pt1: ptv1
# pt2: ptv2
# Needed to run on AKS
fsGroup: 999
# kubectl apply -f examples/example-datacenter.yaml
cassandradatacenter.cassandraoperator.instaclustr.com/cassandra-dc1 created
# kubectl get pod -n grafana -l cassandra-operator.instaclustr.com/cluster=cassandra-dc1
NAME READY STATUS RESTARTS AGE
cassandra-cassandra-dc1-dc1-rack1-0 2/2 Running 0 16m
cassandra-cassandra-dc1-dc1-rack1-1 2/2 Running 0 14m
cassandra-cassandra-dc1-dc1-rack1-2 2/2 Running 0 12m
# kubectl get svc -n grafana |grep cassandra
cassandra-cassandra-dc1-dc1-nodes ClusterIP None <none> 9042/TCP,7199/TCP 176m
cassandra-cassandra-dc1-dc1-prometheus ClusterIP None <none> 9500/TCP 176m
cassandra-cassandra-dc1-dc1-seeds ClusterIP None <none> 7000/TCP 176m
cassandra-operator-metrics ClusterIP 172.21.5.56 <none> 8383/TCP,8686/TCP 4h5m
驗證集群健康:
# kubectl exec cassandra-cassandra-dc1-dc1-rack1-0 -c cassandra -n grafana -- nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.41.180.90 85.66 KiB 256 67.9% 0c59d821-e2ad-4040-98e5-363f7029f3ca rack1
UN 10.41.182.254 81.05 KiB 256 66.3% 3fb8f3f7-2305-4829-99bc-bc62ce30af56 rack1
UN 10.41.190.194 94.99 KiB 256 65.7% 4f70b0e2-6edb-4b51-8472-fe497fa22b88 rack1
測試查看下數據:
# kubectl exec cassandra-cassandra-dc1-dc1-rack1-0 -c cassandra -n grafana -- cqlsh -e "SELECT now() FROM system.local;" cassandra-cassandra-dc1-dc1-nodes -ucassandra -pcassandra
system.now()
--------------------------------------
c05823f0-2e08-11eb-a1bc-4ddca19f6dc1
(1 rows)
cassandra 部分操作:
查看用戶下信息:
# describe cluster;
Cluster: cassandra-dc1
Partitioner: Murmur3Partitioner
查看所有 keyspace:
# describe keyspaces;
system_schema system_auth system loki system_distributed system_traces
查看 keyspace 內容:
# describe keyspace loki;
創建 keyspace:
# CREATE KEYSPACE loki WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
Replication Factor : 復制因數。一條新數據應該被復制到多少個節點。常用奇數。比如我們項目組設置的 replication_factor=3
Replica placement strategy : 復制策略。默認的是 SimpleStrategy. 如果是單機架、單數據中心的模式,保持使用 SimpleStrtegy 即可
或
對于多數據中心策略:
# CREATE KEYSPACE loki WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'dc1' : 1 };
刪除 keyspace:
# DROP KEYSPACE loki;
查看詳情:
# SELECT * FROM system_schema.keyspaces;
keyspace_name | durable_writes | replication
--------------------+----------------+-------------------------------------------------------------------------------------
system_auth | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'}
system_schema | True | {'class': 'org.apache.cassandra.locator.LocalStrategy'}
system_distributed | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}
system | True | {'class': 'org.apache.cassandra.locator.LocalStrategy'}
loki | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}
system_traces | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '2'}
(6 rows)
修改復制策略:
# ALTER KEYSPACE loki WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'replication_factor': '3'};
使用 keyspace:
# use loki;
查示所有表:
# describe tables;
# desc tables;
查看表結構:
# describe columnfamaliy abc;
# desc table stocks
創建表:
# create table abc ( id int primary key, name varchar, age int );
表刪除:
#drop table user;
錯誤處理
錯誤 1:
# kubectl logs -f cassandra-cassandra-dc1-dc1-rack1-0 -c cassandra
INFO [main] Server.java:159 Starting listening for CQL clients on /0.0.0.0:9042 (unencrypted)...
INFO [main] CassandraDaemon.java:564 Not starting RPC server as requested. Use JMX (StorageService->startRPCServer()) or nodetool (enablethrift) to start it
INFO [main] CassandraDaemon.java:650 Startup complete
WARN [OptionalTasks:1] CassandraRoleManager.java:377 CassandraRoleManager skipped default role setup: some nodes were not ready
INFO [OptionalTasks:1] CassandraRoleManager.java:416 Setup task failed with error, rescheduling
WARN [OptionalTasks:1] CassandraRoleManager.java:377 CassandraRoleManager skipped default role setup: some nodes were not ready
INFO [OptionalTasks:1] CassandraRoleManager.java:416 Setup task failed with error, rescheduling
解決:
當重新部署的時候無法建立成員關系,因為 statefulset 只能單個創建,而不能并行創建導致
# kubectl get pod -n grafana
NAME READY STATUS RESTARTS AGE
cassandra-cassandra-dc1-dc1-rack1-0 1/2 Running 0 6m12s
cassandra-operator-6f685694c5-l7m27 1/1 Running 0 4d7h
# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-volume-cassandra-cassandra-dc1-dc1-rack1-0 Bound disk-29d3cfdd-dc5a-457e-bae1-6b72dcc34c37 2Ti RWO alicloud-disk-efficiency-cn-hangzhou-g 4h12m
data-volume-cassandra-cassandra-dc1-dc1-rack1-1 Bound disk-9a8621f6-3f8b-428e-b69d-72cde007c7cf 2Ti RWO alicloud-disk-efficiency-cn-hangzhou-g 4h6m
data-volume-cassandra-cassandra-dc1-dc1-rack1-2 Bound disk-1971e0c4-fdf5-4adf-85fa-c1e9e53b7658 2Ti RWO alicloud-disk-efficiency-cn-hangzhou-g 4h5m
data-volume-cassandra-cassandra-dc1-dc1-rack1-3 Bound disk-5be7e523-a3cc-4b32-9149-6a3ab5e44ed2 2Ti RWO alicloud-disk-efficiency-cn-hangzhou-g 4h3m
data-volume-cassandra-cassandra-dc1-dc1-rack1-4 Bound disk-4a4d235b-871f-45ff-be57-c4ed7c9b4ad2 2Ti RWO alicloud-disk-efficiency-cn-hangzhou-g 4h2m
data-volume-cassandra-cassandra-dc1-dc1-rack1-5 Bound disk-b9c45b99-f169-413b-b8dc-65b97d205264 2Ti RWO alicloud-disk-efficiency-cn-hangzhou-g 4h
data-volume-cassandra-cassandra-dc1-dc1-rack1-6 Bound disk-c2bf3596-a986-4099-b746-316ddaf36c8f 2Ti RWO alicloud-disk-efficiency-cn-hangzhou-g 72m
data-volume-cassandra-cassandra-dc1-dc1-rack1-7 Bound disk-89fae7ec-9f5a-4b2f-9191-631f66ac71b8 2Ti RWO alicloud-disk-efficiency-cn-hangzhou-g 57m
參考官方文檔,好像注釋了 podManagementPolicy: Parallel 部分功能:
@@ -242,7 +241,7 @@ private V1beta2StatefulSet generateStatefulSet(DataCenterKey dataCenterKey, V1Co
)
.spec(new V1beta2StatefulSetSpec()
.serviceName("cassandra")
.podManagementPolicy("Parallel")
//.podManagementPolicy("Parallel")
.replicas(dataCenter.getSpec().getReplicas().intValue())
.selector(new V1LabelSelector().putMatchLabelsItem("cassandra-datacenter", dataCenterKey.name))
.template(new V1PodTemplateSpec()
目前只能臨時解決:刪除 PVC 重新創建 cassandradatacenter
目前也提交了 issue:https://github.com/instaclustr/cassandra-operator/issues/397
錯誤 2:
# kubectl exec cassandra-cassandra-dc1-dc1-rack1-0 -c cassandra -n grafana -- cqlsh -e "ALTER KEYSPACE loki WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'replication_factor': '3'};" cassandra-cassandra-dc1-dc1-nodes -ucassandra -pcassandra
<stdin>:1:ConfigurationException: replication_factor is an option for SimpleStrategy, not NetworkTopologyStrategy
解決:
replication_factor 不兼容 NetworkTopologyStrategy 策略,NetworkTopologyStrategy 允許您為每個 DC 定義不同的復制。正確的查詢應為:
# CREATE KEYSPACE NTSkeyspace WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 1 };
無法指定 initContainer 鏡像問題:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned grafana/cassandra-cassandra-dc1-dc1-rack1-0 to cn-hangzhou.10.41.128.145
Normal Pulling 45s kubelet, cn-hangzhou.10.41.128.145 Pulling image "busybox:latest"
# egrep -r busybox cassandra-operator-7.1.0/
cassandra-operator-7.1.0/pkg/controller/cassandradatacenter/statefulset.go: var image = "busybox:latest"
解決:
參考:https://github.com/instaclustr/cassandra-operator/issues/379
v6.4.0
added initImage field into spec for init container, up to now, it was always busybox:latest. It defaults to this image if that field is empty.
# vi cassandra-operator-7.1.0/examples/example-datacenter.yaml
spec:
initImage: ops-harbor.hupu.io/base/alpine:v3.10