当前位置：网站首页>Kubernetes Kube scheduler

Kubernetes Kube scheduler

2022-07-21 01:25:00 【Zhang quandan, Foxconn quality inspector】

APIServer The main responsibility is certification , authentication , admittance , It determines who initiated a request , Does the initiator have corresponding authority , Is this request legal , And from apiserver This end feels that it is necessary to change some properties of the original request , Then it can be done here .

apserver After these links , All of these tests passed , The request is also legal , Then it will save the request etcd Inside .

apiserver Itself is k8s The only and in the cluster etcd Such a component of database communication , All other components require and apiserver To communicate , To get the change information of data .

kubelet Divided into their own framework code , And include the following interface abstractions , It abstracts the runtime as cri, Abstract the network as cni, Abstract storage as csi.

kube-scheduler

kube-scheduler In charge of distribution and dispatch Pod To nodes in the cluster , It monitors kube-apiserver, The query has not been assigned yet Node Of Pod, Then according to the scheduling strategy for these Pod Assign nodes （ to update Pod Of NodeName Field ）. There are many factors that need to be fully considered by the scheduler ∶

Fair dispatch （ When receiving many requests , Make sure we can handle the request fairly , Everyone is equal , Dispatch based on the principle of first come, first serve , This is fairness . There are also some unfair factors , For example, when scheduling, there is scheduling priority , One of my applications is particularly important , Hope to jump the queue and go ahead , So here k8s Provides complete support , For the same scheduling priority , I am the principle of fairness , Different priorities will jump the queue and put it in front , The higher the priority, the higher the priority ）
Efficient use of resources （ Find the most suitable node and schedule it ）
QoS
affinity and anti-affinity
Data localization （data locality）
Internal load interference （inter-workload interference）
deadlines.

The scheduler will listen to the information of all computing nodes in the cluster , It needs to know how many computing nodes there are in the current cluster , How healthy are these nodes , How are their resources used , How many resources are used , How much can be allocated .

Each computing node will report its own information to apiserver, Our scheduler will go watch apiserver, Get the information of these nodes , Then the scheduler has a global view of the cluster .

On the one hand, it has a global view of the computing resources of all nodes in the cluster , On the other hand, it can accept users pod, For it, it is the scheduling request , To find the best node , Finding the best node is complete pod Binding relationship with nodes . In essence, it's going to be pod Of nodename Fields are filled .

User establishment pod It's time not to fill in nodename Of , Because I don't know pod Where will it be dispatched , The scheduler will go to see one pod Of nodename It's empty , That means you need to dispatch , So it will do scheduling , Find the right node , take nodename Fill it in .

Scheduler

kube-scheduler Scheduling is divided into two stages ,predicate and priority∶

predicate∶ Filter unqualified nodes ;filter

priority∶ Prioritization , Select the node with the highest priority .score

filter: Yes 100 Taiwan machine , You have one pod request , I have to see which nodes do not meet your needs , First filter out the nodes that do not meet the requirements .

After filtering, there may still be 10 Taiwan can meet your needs , Then you need to sort , Sorting is to score according to various factors ,

Predicates Strategy

Because when scheduling , There are many factors to consider , Each factor is a plug-in for the scheduler . do predicate Relative to traversing these predict plug-in unit , Then execute one by one .

PodFitsResources： Check Node Whether our resources are sufficient , Including allowed Pod Number 、CPU、 Memory 、GPU Number and others OpaquelntResources.（ First, let's see which nodes are not satisfied pod Resources , Without proper resources, all machines are brushed off ）

PodFitsHostPorts： Check if there is Host Ports Conflict .

PodFitsPorts： Same as PodFitsHostPorts.（ There are some pod Want to occupy the host port , When I go to dispatch, I need to check whether this port is free , If this port is occupied , It means that this node cannot be installed pod 了）

HostName∶ Check pod.Spec.NodeName Whether it is consistent with the candidate node .

MatchNodeSelector∶ Check the name of the candidate node pod.Spec.NodeSelector match .( Only these nodes will be dispatched )

NoVolumeZoneConflict∶ Check volume zone Conflict or not .

MatchlnterPodAffinity∶ Check for match Pod Affinity requirements for .

NoDiskConflict∶ Check for presence Volume Conflict , Is limited to GCEPD、AWS EBS、Ceph RBD as well as iSCSI.

PodToleratesNodeTaints∶ Check Pod Whether to tolerate Node Taints.

CheckNodeMemoryPressure∶ Check Pod Whether it can be scheduled to MemoryPressure Node .

CheckNodeDiskPressure∶ Check Pod Whether it can be scheduled to DiskPressure Node .

NoVolumeNodeConflict∶ Check whether the node meets Pod Cited Volume Conditions .

There are many other strategies , You can also write your own strategy .

Predicates plugin working principle

When to do pod When scheduling , Will go through one by one predicate Of plugin, I just one by one plugin Go for a run , After every plugin, I will filter a batch of machines , After every plugin Will filter out a batch of machines , Finally, there are machines that meet the scheduling requirements .

Priorities Strategy

about priority There are also many plug-ins , For each plug-in , He also goes through each plug-in to calculate the score , Finally, each node will be scored and summarized , Finally, the node with the highest score will be ranked in the front .

SelectorSpreadPriority∶ Give priority to reducing the number of nodes belonging to the same Service or Replication Controller Of Pod Number .

InterPodAffinityPriority∶ Priority will be Pod Schedule to the same topology （ Like a node 、Rack、Zone etc. ）.

LeastRequestedPriority∶ Give priority to the nodes with less resources .

BalancedResourceAllocation∶ Give priority to balancing the resource use of each node .

NodePreferAvoidPodsPriority∶ alpha.kubernetes.io/preferAvoidPods Field judgment , The weight of 10000, Avoid the impact of other priority strategies .

Resource requirements

CPU

requests

Kubernetes Dispatch Pod when , It will judge the running of the current node Pod Of CPU Request The sum of , Plus current scheduling Pod Of CPU request, Calculate whether it exceeds the CPU Allocable resources .

limits

To configure cgroup To limit the resource limit .

Memory

requests

Judge whether the remaining memory of the node meets Pod Amount of memory requested , To determine if Pod Schedule to this node .

limits

To configure cgroup To limit the resource limit .

Disk resource requirements

Temporary storage of containers （ephemeral storage） Contains logs and writable layer data , By definition Pod Spec Medium limits.ephemeral-storage and requests.ephemeral-storage To apply for .

Pod After the dispatch , The limitation of computing nodes on temporary storage is not based on cgroup Of , But by the kubelet Get the log of the container and the disk usage of the writable layer of the container regularly , If you exceed the limit , It will be right Pod To drive .

Init Container The demand for resources

In a pod In addition to the main container , also init container, Do some initialization ,istio There is initcontainer, After it gets up, it will configure local iptables The rules , Exit after configuration .

For example, the application should pass jwt token To access other applications , Authentication is required between applications , We will use the initialization container , Because of this token It's a one-time acquisition , We will use the initialized container to get this token, This token After obtaining, it will be stored in the local hard disk , Then the hard disk passes volume mount To a main container , And the main container mount To the same path , Then it can be read .

initcontainer Most of the time, it is when the main container preloads resources , When loading the configuration, you can let it do .

When kube-scheduler Scheduling has multiple init Container of Pod when , Only calculate cpu.request The most init Containers , Instead of calculating all init Total containers .（ You can also set request limit）

● Due to the multiple init Containers execute sequentially , And exit immediately after execution , So apply for the most resources init All in the container

Resource requirements , That's all init Container requirements .

● kube-scheduler When calculating the resources occupied by this node ,init The resources of the container will still be included in the calculation . because init

The container may be executed again under certain circumstances , For example, it is caused by changing the image Sandbox When rebuilding .

原网站

版权声明
本文为[Zhang quandan, Foxconn quality inspector]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/202/202207200143574962.html