1. 前言
轉載請說明原文出處, 尊重他人勞動成果!
本文將分析默認調度器是如何注冊和如何被使用的, 主要涉及到了兩個文件
pkg/scheduler/factory/plugins.go
和pkg/scheduler/algorithmprovider/defaults/defaults.go
源碼位置: https://github.com/nicktming/kubernetes
分支: tming-v1.13 (基于v1.13版本)
2. 注冊默認scheduler
相信大家或多或少都看到過類似下面的文件, 可能有所了解或者不了解, 接下來的內容將會對理解這個文件有所幫助.
{
"kind" : "Policy",
"apiVersion" : "v1",
"predicates" : [
{"name" : "PodFitsHostPorts"},
{"name" : "PodFitsResources"},
{"name" : "NoDiskConflict"},
{"name" : "MatchNodeSelector"},
{"name" : "HostName"}
],
"priorities" : [
{"name" : "LeastRequestedPriority", "weight" : 1},
{"name" : "BalancedResourceAllocation", "weight" : 1},
{"name" : "ServiceSpreadingPriority", "weight" : 1},
{"name" : "EqualPriority", "weight" : 1}
],
}
當
kube-scheduler
要調度一個pod
的時候, 現在有一些節點, 到底如何給這個pod
分配節點呢?
總所周知,kube-scheduler
會做預選(predicate)從這些節點選出可以運行這個pod
的節點(比如有些節點因為資源不足或者節點親和性等等無法運行該pod
), 然后通過優選(priority)從這些預選結果中選出得分最高的那個節點作為最終要運行的節點.
那么預選(predicate)是必須要通過哪些預選方法比如上面的文件中
PodFitsHostPorts
,PodFitsResources
等等.
而優選(priority)是每個方法有一個權重, 該pod
在某節點上的得分就是這些方法的總和.
在介紹注冊默認調度器前, 需要先介紹
pkg/scheduler/factory/plugins.go
, 因為該文件就是為注冊調度器而準備的.
3. pkg/scheduler/factory/plugins.go
type PluginFactoryArgs struct {
PodLister algorithm.PodLister
ServiceLister algorithm.ServiceLister
ControllerLister algorithm.ControllerLister
ReplicaSetLister algorithm.ReplicaSetLister
StatefulSetLister algorithm.StatefulSetLister
NodeLister algorithm.NodeLister
PDBLister algorithm.PDBLister
NodeInfo predicates.NodeInfo
PVInfo predicates.PersistentVolumeInfo
PVCInfo predicates.PersistentVolumeClaimInfo
StorageClassInfo predicates.StorageClassInfo
VolumeBinder *volumebinder.VolumeBinder
HardPodAffinitySymmetricWeight int32
}
type FitPredicateFactory func(PluginFactoryArgs) algorithm.FitPredicate
type PriorityFunctionFactory func(PluginFactoryArgs) algorithm.PriorityFunction
type PriorityFunctionFactory2 func(PluginFactoryArgs) (algorithm.PriorityMapFunction, algorithm.PriorityReduceFunction)
FitPredicateFactory
: 根據PluginFactoryArgs
返回預選方法
PriorityFunctionFactory
: 根據PluginFactoryArgs
返回優選方法 老版本
PriorityFunctionFactory2
: 根據PluginFactoryArgs
返回優選方法 新版本 返回Map
和Reduce
方法
3.1 基本結構
type PriorityConfigFactory struct {
Function PriorityFunctionFactory
MapReduceFunction PriorityFunctionFactory2
Weight int
}
var (
schedulerFactoryMutex sync.Mutex
// maps that hold registered algorithm types
fitPredicateMap = make(map[string]FitPredicateFactory)
mandatoryFitPredicates = sets.NewString()
priorityFunctionMap = make(map[string]PriorityConfigFactory)
algorithmProviderMap = make(map[string]AlgorithmProviderConfig)
// Registered metadata producers
priorityMetadataProducer PriorityMetadataProducerFactory
predicateMetadataProducer PredicateMetadataProducerFactory
)
const (
// DefaultProvider defines the default algorithm provider name.
DefaultProvider = "DefaultProvider"
)
type AlgorithmProviderConfig struct {
FitPredicateKeys sets.String
PriorityFunctionKeys sets.String
}
可以看到默認調度器的名字為
DefaultProvider
.
fitPredicateMap
: 是一個全局變量, 存著預選名字(predicate)和對應的生成預選方法的FitPredicateFactory.
priorityFunctionMap
: 也是一個全局變量, 存著優選名字(priority)和其對應的生成優選方法的PriorityConfigFactory.
algorithmProviderMap
: 也是一個全局變量, 存著該調度器(比如DefaultProvider)和其擁有的所有預選名字和所有優選名字. (因為AlgorithmProviderConfig包含著預選和優選名字)
mandatoryFitPredicates
: 全局變量, 存著mandatory的預選名字.
3.2 注冊預選方法
// pkg/scheduler/factory/plugins.go
func RegisterFitPredicate(name string, predicate algorithm.FitPredicate) string {
return RegisterFitPredicateFactory(name, func(PluginFactoryArgs) algorithm.FitPredicate { return predicate })
}
// 通過正則表達式檢查一下預選的名字是否合法
var validName = regexp.MustCompile("^[a-zA-Z0-9]([-a-zA-Z0-9]*[a-zA-Z0-9])$")
func validateAlgorithmNameOrDie(name string) {
if !validName.MatchString(name) {
klog.Fatalf("Algorithm name %v does not match the name validation regexp \"%v\".", name, validName)
}
}
func RegisterFitPredicateFactory(name string, predicateFactory FitPredicateFactory) string {
schedulerFactoryMutex.Lock()
defer schedulerFactoryMutex.Unlock()
validateAlgorithmNameOrDie(name)
fitPredicateMap[name] = predicateFactory
return name
}
很簡單, 就是把預選名字和預選方法傳進來, 然后注冊的FitPredicateFactory生成預選方法的時候就是返回傳入進來的預選方法
predicate
. 然后返回name
.
接下來這個是注冊自己的
FitPredicateFactory
. 這個就什么都沒有動, 就是放到map
里. 然后返回name
. 另外RegisterMandatoryFitPredicate
多做了一步就是把該name
加入到mandatoryFitPredicates
中.
func RegisterFitPredicateFactory(name string, predicateFactory FitPredicateFactory) string {
schedulerFactoryMutex.Lock()
defer schedulerFactoryMutex.Unlock()
validateAlgorithmNameOrDie(name)
fitPredicateMap[name] = predicateFactory
return name
}
func RegisterMandatoryFitPredicate(name string, predicate algorithm.FitPredicate) string {
schedulerFactoryMutex.Lock()
defer schedulerFactoryMutex.Unlock()
validateAlgorithmNameOrDie(name)
fitPredicateMap[name] = func(PluginFactoryArgs) algorithm.FitPredicate { return predicate }
mandatoryFitPredicates.Insert(name)
return name
}
接下來看看
pkg/scheduler/algorithmprovider/defaults/defaults.go
中defaultPredicates
方法如何注冊的.
// pkg/scheduler/algorithmprovider/defaults/defaults.go
func defaultPredicates() sets.String {
return sets.NewString(
factory.RegisterFitPredicateFactory(
predicates.NoVolumeZoneConflictPred,
func(args factory.PluginFactoryArgs) algorithm.FitPredicate {
return predicates.NewVolumeZonePredicate(args.PVInfo, args.PVCInfo, args.StorageClassInfo)
},
),
...
factory.RegisterMandatoryFitPredicate(predicates.CheckNodeConditionPred, predicates.CheckNodeConditionPredicate),
factory.RegisterFitPredicate(predicates.PodToleratesNodeTaintsPred, predicates.PodToleratesNodeTaints),
...
)
}
可以看到既調用了
RegisterFitPredicateFactory
,RegisterMandatoryFitPredicate
, 和RegisterFitPredicate
, 這樣fitPredicateMap
這個全局變量里面存著所有注冊的預選名字以及其對應生成預選方法的predicateFactory
.
其中defaultPredicates()
的返回值就是fitPredicateMap
的所有key
.
3.3 注冊優選方法
// pkg/scheduler/factory/plugins.go
func RegisterPriorityConfigFactory(name string, pcf PriorityConfigFactory) string {
schedulerFactoryMutex.Lock()
defer schedulerFactoryMutex.Unlock()
validateAlgorithmNameOrDie(name)
priorityFunctionMap[name] = pcf
return name
}
func RegisterPriorityFunction2(
name string,
mapFunction algorithm.PriorityMapFunction,
reduceFunction algorithm.PriorityReduceFunction,
weight int) string {
return RegisterPriorityConfigFactory(name, PriorityConfigFactory{
MapReduceFunction: func(PluginFactoryArgs) (algorithm.PriorityMapFunction, algorithm.PriorityReduceFunction) {
return mapFunction, reduceFunction
},
Weight: weight,
})
}
可以看到
RegisterPriorityFunction2
是后期版本開發的, 帶有map-reduce
方法, 為了兼容前面版本, 所以都是注冊的生成優選方法的都是PriorityConfigFactory
. 然后返回name
.
接下來看看
pkg/scheduler/algorithmprovider/defaults/defaults.go
中defaultPredicates
方法如何注冊的.
// pkg/scheduler/algorithmprovider/defaults/defaults.go
func defaultPriorities() sets.String {
return sets.NewString(
// spreads pods by minimizing the number of pods (belonging to the same service or replication controller) on the same node.
factory.RegisterPriorityConfigFactory(
"SelectorSpreadPriority",
factory.PriorityConfigFactory{
MapReduceFunction: func(args factory.PluginFactoryArgs) (algorithm.PriorityMapFunction, algorithm.PriorityReduceFunction) {
return priorities.NewSelectorSpreadPriority(args.ServiceLister, args.ControllerLister, args.ReplicaSetLister, args.StatefulSetLister)
},
Weight: 1,
},
),
...
factory.RegisterPriorityFunction2("ImageLocalityPriority", priorities.ImageLocalityPriorityMap, nil, 1),
)
}
其實更預選一樣, 然后注冊的優選方法都在全局變量
priorityFunctionMap
, 并且defaultPriorities()
返回的就是注冊的所有優選方法的名字.
3.4 注冊調度器
可以看到注冊一個調度器需要傳入調度器的名字(
name
) 以及該調度器擁有的預選方法(predicateKeys
) 和 優選方法(priorityKeys
)
// pkg/scheduler/factory/plugins.go
func RegisterAlgorithmProvider(name string, predicateKeys, priorityKeys sets.String) string {
schedulerFactoryMutex.Lock()
defer schedulerFactoryMutex.Unlock()
validateAlgorithmNameOrDie(name)
algorithmProviderMap[name] = AlgorithmProviderConfig{
FitPredicateKeys: predicateKeys,
PriorityFunctionKeys: priorityKeys,
}
return name
}
接下來看看
pkg/scheduler/algorithmprovider/defaults/defaults.go
中registerAlgorithmProvider
方法如何注冊的.
// pkg/scheduler/algorithmprovider/defaults/defaults.go
func registerAlgorithmProvider(predSet, priSet sets.String) {
factory.RegisterAlgorithmProvider(factory.DefaultProvider, predSet, priSet)
...
}
可以看到該方法就算是把默認的調度器存到
algorithmProviderMap
這個全局變量中了. 也就是可以通過algorithmProviderMap["DefaultProvider"]
獲得默認調度器了.
// pkg/scheduler/factory/plugins.go
func GetAlgorithmProvider(name string) (*AlgorithmProviderConfig, error) {
schedulerFactoryMutex.Lock()
defer schedulerFactoryMutex.Unlock()
provider, ok := algorithmProviderMap[name]
if !ok {
return nil, fmt.Errorf("plugin %q has not been registered", name)
}
return &provider, nil
}
根據調度器名字獲得調度器. 所以
GetAlgorithmProvider("DefaultProvider")
就可以獲得默認調度器了.
3.5 注冊默認調度器
那什么時候會調用
defults
中的registerAlgorithmProvider
方法呢?
可以看到pkg/scheduler/algorithmprovider/defaults/defaults.go
中的init
方法.
// pkg/scheduler/algorithmprovider/defaults/defaults.go
func init() {
...
registerAlgorithmProvider(defaultPredicates(), defaultPriorities())
...
}
也就是引用了
pkg/scheduler/algorithmprovider/defaults/defaults.go
文件的時候就會把默認調度器注冊到algorithmProviderMap
全局變量中了.
3.6 使用默認調度器
在
kube-scheduler
啟動的時候會進入到pkg/scheduler/scheduler.go
中的New
方法生成Scheduler
實例.
// pkg/scheduler/scheduler.go
// New returns a Scheduler
func New(client clientset.Interface,
nodeInformer coreinformers.NodeInformer,
podInformer coreinformers.PodInformer,
pvInformer coreinformers.PersistentVolumeInformer,
pvcInformer coreinformers.PersistentVolumeClaimInformer,
replicationControllerInformer coreinformers.ReplicationControllerInformer,
replicaSetInformer appsinformers.ReplicaSetInformer,
statefulSetInformer appsinformers.StatefulSetInformer,
serviceInformer coreinformers.ServiceInformer,
pdbInformer policyinformers.PodDisruptionBudgetInformer,
storageClassInformer storageinformers.StorageClassInformer,
recorder record.EventRecorder,
schedulerAlgorithmSource kubeschedulerconfig.SchedulerAlgorithmSource,
stopCh <-chan struct{},
opts ...func(o *schedulerOptions)) (*Scheduler, error) {
...
source := schedulerAlgorithmSource
switch {
case source.Provider != nil:
// Create the config from a named algorithm provider.
sc, err := configurator.CreateFromProvider(*source.Provider)
if err != nil {
return nil, fmt.Errorf("couldn't create scheduler using provider %q: %v", *source.Provider, err)
}
config = sc
case source.Policy != nil:
// Create the config from a user specified policy source.
policy := &schedulerapi.Policy{}
switch {
case source.Policy.File != nil:
if err := initPolicyFromFile(source.Policy.File.Path, policy); err != nil {
return nil, err
}
case source.Policy.ConfigMap != nil:
if err := initPolicyFromConfigMap(client, source.Policy.ConfigMap, policy); err != nil {
return nil, err
}
}
sc, err := configurator.CreateFromConfig(*policy)
if err != nil {
return nil, fmt.Errorf("couldn't create scheduler from policy: %v", err)
}
config = sc
default:
return nil, fmt.Errorf("unsupported algorithm source: %v", source)
}
...
}
1. 在
kube-scheduler
啟動命令中如果配置了config
參數也就是說用戶自己配置預選和優選方法. (這部分在自定義scheduler部分分析), 會進入到case source.Policy != nil:
部分進行操作.
2. 如果沒有配置的話就會進入到case source.Provider != nil:
部分進行, 因為此時的*source.Provider
就是DefaultProvider
. 進而configurator.CreateFromProvider(*source.Provider)
就會進入到pkg/scheduler/factory/factory.go
中進行操作, 因為此時的configurator
是一個configFactory
對象.
// pkg/scheduler/factory/factory.go
func (c *configFactory) CreateFromProvider(providerName string) (*Config, error) {
klog.V(2).Infof("Creating scheduler from algorithm provider '%v'", providerName)
provider, err := GetAlgorithmProvider(providerName)
if err != nil {
return nil, err
}
return c.CreateFromKeys(provider.FitPredicateKeys, provider.PriorityFunctionKeys, []algorithm.SchedulerExtender{})
}
可以看到該方法中調用了
pkg/scheduler/factory/plugins.go
的GetAlgorithmProvider
方法, 所以就獲得了默認調度器(DefaultProvider
)的配置(預選方法和優選方法).
4. 總結
本文分析了默認調度器是如何注冊和如何被使用的, 主要涉及到了兩個文件
pkg/scheduler/factory/plugins.go
和pkg/scheduler/algorithmprovider/defaults/defaults.go
. 對自定義調度器注冊預選和優選信息也會有所幫助, 因為自定義調度器肯定也是往上面說的那些全局變量里面寫.