[k8s源碼分析][kube-scheduler]scheduler/algorithmprovider之注冊default-scheduler

1. 前言

轉載請說明原文出處, 尊重他人勞動成果!

本文將分析默認調度器是如何注冊和如何被使用的, 主要涉及到了兩個文件pkg/scheduler/factory/plugins.gopkg/scheduler/algorithmprovider/defaults/defaults.go
源碼位置: https://github.com/nicktming/kubernetes
分支: tming-v1.13 (基于v1.13版本)

2. 注冊默認scheduler

相信大家或多或少都看到過類似下面的文件, 可能有所了解或者不了解, 接下來的內容將會對理解這個文件有所幫助.

{
    "kind" : "Policy",
    "apiVersion" : "v1",
    "predicates" : [
      {"name" : "PodFitsHostPorts"},
      {"name" : "PodFitsResources"},
      {"name" : "NoDiskConflict"},
      {"name" : "MatchNodeSelector"},
      {"name" : "HostName"}
    ],
    "priorities" : [
      {"name" : "LeastRequestedPriority", "weight" : 1},
      {"name" : "BalancedResourceAllocation", "weight" : 1},
      {"name" : "ServiceSpreadingPriority", "weight" : 1},
      {"name" : "EqualPriority", "weight" : 1}
    ],
}

kube-scheduler要調度一個pod的時候, 現在有一些節點, 到底如何給這個pod分配節點呢?
總所周知, kube-scheduler會做預選(predicate)從這些節點選出可以運行這個pod的節點(比如有些節點因為資源不足或者節點親和性等等無法運行該pod), 然后通過優選(priority)從這些預選結果中選出得分最高的那個節點作為最終要運行的節點.

那么預選(predicate)是必須要通過哪些預選方法比如上面的文件中PodFitsHostPorts, PodFitsResources等等.
優選(priority)是每個方法有一個權重, 該pod在某節點上的得分就是這些方法的總和.

在介紹注冊默認調度器前, 需要先介紹pkg/scheduler/factory/plugins.go, 因為該文件就是為注冊調度器而準備的.

3. pkg/scheduler/factory/plugins.go

type PluginFactoryArgs struct {
    PodLister                      algorithm.PodLister
    ServiceLister                  algorithm.ServiceLister
    ControllerLister               algorithm.ControllerLister
    ReplicaSetLister               algorithm.ReplicaSetLister
    StatefulSetLister              algorithm.StatefulSetLister
    NodeLister                     algorithm.NodeLister
    PDBLister                      algorithm.PDBLister
    NodeInfo                       predicates.NodeInfo
    PVInfo                         predicates.PersistentVolumeInfo
    PVCInfo                        predicates.PersistentVolumeClaimInfo
    StorageClassInfo               predicates.StorageClassInfo
    VolumeBinder                   *volumebinder.VolumeBinder
    HardPodAffinitySymmetricWeight int32
}
type FitPredicateFactory func(PluginFactoryArgs) algorithm.FitPredicate
type PriorityFunctionFactory func(PluginFactoryArgs) algorithm.PriorityFunction
type PriorityFunctionFactory2 func(PluginFactoryArgs) (algorithm.PriorityMapFunction, algorithm.PriorityReduceFunction)

FitPredicateFactory: 根據PluginFactoryArgs返回預選方法
PriorityFunctionFactory: 根據PluginFactoryArgs返回優選方法 老版本
PriorityFunctionFactory2: 根據PluginFactoryArgs返回優選方法 新版本 返回MapReduce方法

3.1 基本結構

type PriorityConfigFactory struct {
    Function          PriorityFunctionFactory
    MapReduceFunction PriorityFunctionFactory2
    Weight            int
}

var (
    schedulerFactoryMutex sync.Mutex

    // maps that hold registered algorithm types
    fitPredicateMap        = make(map[string]FitPredicateFactory)
    mandatoryFitPredicates = sets.NewString()
    priorityFunctionMap    = make(map[string]PriorityConfigFactory)
    algorithmProviderMap   = make(map[string]AlgorithmProviderConfig)

    // Registered metadata producers
    priorityMetadataProducer  PriorityMetadataProducerFactory
    predicateMetadataProducer PredicateMetadataProducerFactory
)

const (
    // DefaultProvider defines the default algorithm provider name.
    DefaultProvider = "DefaultProvider"
)
type AlgorithmProviderConfig struct {
    FitPredicateKeys     sets.String
    PriorityFunctionKeys sets.String
}

可以看到默認調度器的名字為DefaultProvider.
fitPredicateMap: 是一個全局變量, 存著預選名字(predicate)和對應的生成預選方法的FitPredicateFactory.
priorityFunctionMap: 也是一個全局變量, 存著優選名字(priority)和其對應的生成優選方法的PriorityConfigFactory.
algorithmProviderMap: 也是一個全局變量, 存著該調度器(比如DefaultProvider)和其擁有的所有預選名字和所有優選名字. (因為AlgorithmProviderConfig包含著預選和優選名字)
mandatoryFitPredicates: 全局變量, 存著mandatory的預選名字.

3.2 注冊預選方法

// pkg/scheduler/factory/plugins.go

func RegisterFitPredicate(name string, predicate algorithm.FitPredicate) string {
    return RegisterFitPredicateFactory(name, func(PluginFactoryArgs) algorithm.FitPredicate { return predicate })
}
// 通過正則表達式檢查一下預選的名字是否合法
var validName = regexp.MustCompile("^[a-zA-Z0-9]([-a-zA-Z0-9]*[a-zA-Z0-9])$")

func validateAlgorithmNameOrDie(name string) {
    if !validName.MatchString(name) {
        klog.Fatalf("Algorithm name %v does not match the name validation regexp \"%v\".", name, validName)
    }
}
func RegisterFitPredicateFactory(name string, predicateFactory FitPredicateFactory) string {
    schedulerFactoryMutex.Lock()
    defer schedulerFactoryMutex.Unlock()
    validateAlgorithmNameOrDie(name)
    fitPredicateMap[name] = predicateFactory
    return name
}

很簡單, 就是把預選名字和預選方法傳進來, 然后注冊的FitPredicateFactory生成預選方法的時候就是返回傳入進來的預選方法predicate. 然后返回name.

接下來這個是注冊自己的FitPredicateFactory. 這個就什么都沒有動, 就是放到map里. 然后返回name. 另外RegisterMandatoryFitPredicate多做了一步就是把該name加入到mandatoryFitPredicates中.

func RegisterFitPredicateFactory(name string, predicateFactory FitPredicateFactory) string {
    schedulerFactoryMutex.Lock()
    defer schedulerFactoryMutex.Unlock()
    validateAlgorithmNameOrDie(name)
    fitPredicateMap[name] = predicateFactory
    return name
}
func RegisterMandatoryFitPredicate(name string, predicate algorithm.FitPredicate) string {
    schedulerFactoryMutex.Lock()
    defer schedulerFactoryMutex.Unlock()
    validateAlgorithmNameOrDie(name)
    fitPredicateMap[name] = func(PluginFactoryArgs) algorithm.FitPredicate { return predicate }
    mandatoryFitPredicates.Insert(name)
    return name
}

接下來看看pkg/scheduler/algorithmprovider/defaults/defaults.godefaultPredicates方法如何注冊的.

// pkg/scheduler/algorithmprovider/defaults/defaults.go

func defaultPredicates() sets.String {
    return sets.NewString(
        factory.RegisterFitPredicateFactory(
            predicates.NoVolumeZoneConflictPred,
            func(args factory.PluginFactoryArgs) algorithm.FitPredicate {
                return predicates.NewVolumeZonePredicate(args.PVInfo, args.PVCInfo, args.StorageClassInfo)
            },
        ),
        ...
        factory.RegisterMandatoryFitPredicate(predicates.CheckNodeConditionPred, predicates.CheckNodeConditionPredicate),
        factory.RegisterFitPredicate(predicates.PodToleratesNodeTaintsPred, predicates.PodToleratesNodeTaints),
        ...
    )
}

可以看到既調用了RegisterFitPredicateFactory, RegisterMandatoryFitPredicate, 和 RegisterFitPredicate, 這樣fitPredicateMap這個全局變量里面存著所有注冊的預選名字以及其對應生成預選方法的predicateFactory.
其中defaultPredicates()的返回值就是fitPredicateMap的所有key.

3.3 注冊優選方法

// pkg/scheduler/factory/plugins.go
func RegisterPriorityConfigFactory(name string, pcf PriorityConfigFactory) string {
    schedulerFactoryMutex.Lock()
    defer schedulerFactoryMutex.Unlock()
    validateAlgorithmNameOrDie(name)
    priorityFunctionMap[name] = pcf
    return name
}
func RegisterPriorityFunction2(
    name string,
    mapFunction algorithm.PriorityMapFunction,
    reduceFunction algorithm.PriorityReduceFunction,
    weight int) string {
    return RegisterPriorityConfigFactory(name, PriorityConfigFactory{
        MapReduceFunction: func(PluginFactoryArgs) (algorithm.PriorityMapFunction, algorithm.PriorityReduceFunction) {
            return mapFunction, reduceFunction
        },
        Weight: weight,
    })
}

可以看到RegisterPriorityFunction2是后期版本開發的, 帶有map-reduce方法, 為了兼容前面版本, 所以都是注冊的生成優選方法的都是PriorityConfigFactory. 然后返回name.

接下來看看pkg/scheduler/algorithmprovider/defaults/defaults.godefaultPredicates方法如何注冊的.

// pkg/scheduler/algorithmprovider/defaults/defaults.go

func defaultPriorities() sets.String {
    return sets.NewString(
        // spreads pods by minimizing the number of pods (belonging to the same service or replication controller) on the same node.
        factory.RegisterPriorityConfigFactory(
            "SelectorSpreadPriority",
            factory.PriorityConfigFactory{
                MapReduceFunction: func(args factory.PluginFactoryArgs) (algorithm.PriorityMapFunction, algorithm.PriorityReduceFunction) {
                    return priorities.NewSelectorSpreadPriority(args.ServiceLister, args.ControllerLister, args.ReplicaSetLister, args.StatefulSetLister)
                },
                Weight: 1,
            },
        ),
        ...
        factory.RegisterPriorityFunction2("ImageLocalityPriority", priorities.ImageLocalityPriorityMap, nil, 1),
    )
}

其實更預選一樣, 然后注冊的優選方法都在全局變量priorityFunctionMap, 并且defaultPriorities()返回的就是注冊的所有優選方法的名字.

3.4 注冊調度器

可以看到注冊一個調度器需要傳入調度器的名字(name) 以及該調度器擁有的預選方法(predicateKeys) 和 優選方法(priorityKeys)

// pkg/scheduler/factory/plugins.go

func RegisterAlgorithmProvider(name string, predicateKeys, priorityKeys sets.String) string {
    schedulerFactoryMutex.Lock()
    defer schedulerFactoryMutex.Unlock()
    validateAlgorithmNameOrDie(name)
    algorithmProviderMap[name] = AlgorithmProviderConfig{
        FitPredicateKeys:     predicateKeys,
        PriorityFunctionKeys: priorityKeys,
    }
    return name
}

接下來看看pkg/scheduler/algorithmprovider/defaults/defaults.goregisterAlgorithmProvider方法如何注冊的.

// pkg/scheduler/algorithmprovider/defaults/defaults.go
func registerAlgorithmProvider(predSet, priSet sets.String) {
    factory.RegisterAlgorithmProvider(factory.DefaultProvider, predSet, priSet)
    ...
}

可以看到該方法就算是把默認的調度器存到algorithmProviderMap這個全局變量中了. 也就是可以通過algorithmProviderMap["DefaultProvider"]獲得默認調度器了.

//  pkg/scheduler/factory/plugins.go

func GetAlgorithmProvider(name string) (*AlgorithmProviderConfig, error) {
    schedulerFactoryMutex.Lock()
    defer schedulerFactoryMutex.Unlock()

    provider, ok := algorithmProviderMap[name]
    if !ok {
        return nil, fmt.Errorf("plugin %q has not been registered", name)
    }

    return &provider, nil
}

根據調度器名字獲得調度器. 所以GetAlgorithmProvider("DefaultProvider")就可以獲得默認調度器了.

3.5 注冊默認調度器

那什么時候會調用defults中的registerAlgorithmProvider方法呢?
可以看到pkg/scheduler/algorithmprovider/defaults/defaults.go中的init方法.

// pkg/scheduler/algorithmprovider/defaults/defaults.go

func init() {
    ...
    registerAlgorithmProvider(defaultPredicates(), defaultPriorities())
    ...
}

也就是引用了pkg/scheduler/algorithmprovider/defaults/defaults.go文件的時候就會把默認調度器注冊到algorithmProviderMap全局變量中了.

3.6 使用默認調度器

kube-scheduler啟動的時候會進入到pkg/scheduler/scheduler.go中的New方法生成Scheduler實例.

// pkg/scheduler/scheduler.go

// New returns a Scheduler
func New(client clientset.Interface,
    nodeInformer coreinformers.NodeInformer,
    podInformer coreinformers.PodInformer,
    pvInformer coreinformers.PersistentVolumeInformer,
    pvcInformer coreinformers.PersistentVolumeClaimInformer,
    replicationControllerInformer coreinformers.ReplicationControllerInformer,
    replicaSetInformer appsinformers.ReplicaSetInformer,
    statefulSetInformer appsinformers.StatefulSetInformer,
    serviceInformer coreinformers.ServiceInformer,
    pdbInformer policyinformers.PodDisruptionBudgetInformer,
    storageClassInformer storageinformers.StorageClassInformer,
    recorder record.EventRecorder,
    schedulerAlgorithmSource kubeschedulerconfig.SchedulerAlgorithmSource,
    stopCh <-chan struct{},
    opts ...func(o *schedulerOptions)) (*Scheduler, error) {
...
    source := schedulerAlgorithmSource
    switch {
    case source.Provider != nil:
        // Create the config from a named algorithm provider.
        sc, err := configurator.CreateFromProvider(*source.Provider)
        if err != nil {
            return nil, fmt.Errorf("couldn't create scheduler using provider %q: %v", *source.Provider, err)
        }
        config = sc
    case source.Policy != nil:
        // Create the config from a user specified policy source.
        policy := &schedulerapi.Policy{}
        switch {
        case source.Policy.File != nil:
            if err := initPolicyFromFile(source.Policy.File.Path, policy); err != nil {
                return nil, err
            }
        case source.Policy.ConfigMap != nil:
            if err := initPolicyFromConfigMap(client, source.Policy.ConfigMap, policy); err != nil {
                return nil, err
            }
        }
        sc, err := configurator.CreateFromConfig(*policy)
        if err != nil {
            return nil, fmt.Errorf("couldn't create scheduler from policy: %v", err)
        }
        config = sc
    default:
        return nil, fmt.Errorf("unsupported algorithm source: %v", source)
    }
...
}

1.kube-scheduler啟動命令中如果配置了config參數也就是說用戶自己配置預選和優選方法. (這部分在自定義scheduler部分分析), 會進入到case source.Policy != nil:部分進行操作.
2. 如果沒有配置的話就會進入到case source.Provider != nil:部分進行, 因為此時的*source.Provider就是DefaultProvider. 進而configurator.CreateFromProvider(*source.Provider)就會進入到pkg/scheduler/factory/factory.go中進行操作, 因為此時的configurator是一個configFactory對象.

// pkg/scheduler/factory/factory.go

func (c *configFactory) CreateFromProvider(providerName string) (*Config, error) {
    klog.V(2).Infof("Creating scheduler from algorithm provider '%v'", providerName)
    provider, err := GetAlgorithmProvider(providerName)
    if err != nil {
        return nil, err
    }
    return c.CreateFromKeys(provider.FitPredicateKeys, provider.PriorityFunctionKeys, []algorithm.SchedulerExtender{})
}

可以看到該方法中調用了pkg/scheduler/factory/plugins.goGetAlgorithmProvider方法, 所以就獲得了默認調度器(DefaultProvider)的配置(預選方法和優選方法).

4. 總結

本文分析了默認調度器是如何注冊和如何被使用的, 主要涉及到了兩個文件pkg/scheduler/factory/plugins.gopkg/scheduler/algorithmprovider/defaults/defaults.go. 對自定義調度器注冊預選和優選信息也會有所幫助, 因為自定義調度器肯定也是往上面說的那些全局變量里面寫.

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容