创建一个Pod的工作流程
相关概念
001 Kubernetes基于list-watch机制的控制器架构,实现组件间交互的解耦
002 其他组件监控自己负责的资源,当这些资源发生变化时,kubeapiserver会通知这些组件,这个过程类似于发布与订阅
工作流程图

1636811426331.png
Pod中影响调度的主要属性
资源调度依据
resources: {}
调度策略
schedulerName: default-scheduler
nodeName: ""
nodeSelector: {}
affinity: {}
tolerations: []

1636856892421.png
资源限制对Pod调度的影响
相关参数
001 容器资源限制
• resources.limits.cpu
• resources.limits.memory
002 容器使用的最小资源需求,作为容器调度时资源分配的依据
• resources.requests.cpu
• resources.requests.memory
003 CPU单位:可以写m也可以写浮点数,例如0.5=500m,1=1000m
004 K8s会根据Request的值去查找有足够资源的Node来调度此Pod
示例1 定义一个无法分配的pod
apiVersion: v1
kind: Pod
metadata:
name: pod-resource2
spec:
containers:
- name: web
image: nginx
resources:
requests:
memory: "4Gi"
cpu: "2000m"
------------------------------------------------------------------------------
[root@k8smaster pod]# kubectl apply -f pod-resource2.yaml
pod/pod-resource2 created
#无法分配 有两种可能
#001 没有合适的节点分配资源
#002 正在拉取镜像
[root@k8smaster pod]# kubectl get pod
NAME READY STATUS RESTARTS AGE
pod-resource2 0/1 Pending 0 56s
#查看无法分配的原因,没有足够的内存
[root@k8smaster pod]# kubectl describe pod pod-resource2
0/2 nodes are available: 2 Insufficient memory.
#查看node节点的分配状况
[root@k8smaster pod]# kubectl describe node k8snode1
#可以分配的资源
Allocatable:
cpu: 4
ephemeral-storage: 15258982785
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 2759772Ki
pods: 110
#已经分配的资源
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 550m (13%) 100m (2%)
memory 200Mi (7%) 400Mi (14%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
示例2 资源限制
apiVersion: v1
kind: Pod
metadata:
name: pod-resource
spec:
containers:
- name: web
image: nginx
resources:
requests: # 容器最小资源配额
memory: "64Mi"
cpu: "250m"
limits: # 容器最大资源上限
memory: "128Mi"
cpu: "500m"
-----------------------------------------------------------------------------
[root@k8smaster pod]# kubectl apply -f pod-resource.yaml
pod/pod-resource created
[root@k8smaster pod]# kubectl get pod
NAME READY STATUS RESTARTS AGE
pod-resource 1/1 Running 0 50s
nodeSelector & nodeAffinity
nodeSelector概述
001 用于将Pod调度到匹配Label的Node上,如果没有匹配的标签会调度失败
002 约束Pod到特定的节点运行
003 完全匹配节点标签
nodeSelector应用场景
001 专用节点:根据业务线将Node分组管理
002 配备特殊硬件:部分Node配有SSD硬盘、GPU
nodeSelector示例:确保Pod分配到具有SSD硬盘的节点上
给节点添加标签
格式:kubectl label nodes <node-name> <label-key>=<label-value>
[root@k8smaster pod]# kubectl label nodes k8snode1 disktype=ssd
node/k8snode1 labeled
#验证
[root@k8smaster pod]# kubectl get nodes --show-labels |grep k8snode1
k8snode1 Ready node 14d v1.19.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8snode1,kubernetes.io/os=linux,node-role.kubernetes.io/node=
#yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-nodeselector
spec:
nodeSelector:
disktype: "ssd"
containers:
- name: web
image: nginx
------------------------------------------------------------------
[root@k8smaster pod]# kubectl get pod
NAME READY STATUS RESTARTS AGE
pod-nodeselector 1/1 Running 0 18s
001 只能分配到带有disktype: "ssd" 此标签的节点上
#yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-nodeselector2
spec:
nodeSelector:
gpu: "nvidia"
containers:
- name: web
image: nginx
-----------------------------------------------------------------------
[root@k8smaster pod]# kubectl apply -f pod-nodeselector2.yaml
pod/pod-nodeselector2 created
#无法分配
[root@k8smaster pod]# kubectl get pod
NAME READY STATUS RESTARTS AGE
pod-nodeselector2 0/1 Pending 0 10s
#原因
[root@k8smaster pod]# kubectl describe pod pod-nodeselector2
0/2 nodes are available: 2 node(s) didn't match node selector.
nodeAffinity概述
001 节点亲和性,与nodeSelector作用一样,但相比更灵活
002 优先分配符合条件的节点,实在不符合,可以将就
003 匹配有更多的逻辑组合,不只是字符串的完全相等
004 调度分为软策略和硬策略,而不是硬性要求
• 硬(required):必须满足
• 软(preferred):尝试满足,但不保证
005 操作符:In、NotIn、Exists、DoesNotExist、Gt、Lt
nodeAffinity示例1-软策略
apiVersion: v1
kind: Pod
metadata:
name: pod-node-affinity
spec:
affinity:
nodeAffinity:
#requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: gpu
# operator: In
# values:
# - nvidia-tesla
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: gpu
operator: In
values:
- nvidia
containers:
- name: web
image: nginx
---------------------------------------------------------------------------
#虽然不符合要求,但是因为软策略,也将就运行
[root@k8smaster pod]# kubectl apply -f pod-node-affinity.yaml
pod/pod-node-affinity created
[root@k8smaster pod]# kubectl get pod
NAME READY STATUS RESTARTS AGE
pod-node-affinity 1/1 Running 0 30s
nodeAffinity示例2-硬策略
apiVersion: v1
kind: Pod
metadata:
name: pod-node-affinity2
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: gpu
operator: In
values:
- nvidia
#preferredDuringSchedulingIgnoredDuringExecution:
#- weight: 1
# preference:
# matchExpressions:
# - key: gpu
# operator: In
# values:
# - nvidia
containers:
- name: web
image: nginx
------------------------------------------------------------------------------
[root@k8smaster pod]# kubectl apply -f pod-node-affinity2.yaml
pod/pod-node-affinity2 created
#无法分配
[root@k8smaster pod]# kubectl get pod
NAME READY STATUS RESTARTS AGE
pod-node-affinity2 0/1 Pending 0 5s
#原因
[root@k8smaster pod]# kubectl describe pod pod-node-affinity2
0/2 nodes are available: 2 node(s) didn't match node selector.
Taint(污点)与Tolerations(污点容忍)
相关概述
Taints
001 避免Pod调度到特定Node上
002 node上打了污点,不会分配任何的pod,除非配置容忍污点
Tolerations
001 允许Pod调度到持有Taints的Node上
002 一种悲观算法,只要有污点的节点,全部拒绝,但是可以给某些还污点的节点配置污点容忍,就可以分配
应用场景
001 专用节点:根据业务线将Node分组管理,希望在默认情况下不调度该节点,只有配置了污点容忍才允许分配
002 配备特殊硬件:部分Node配有SSD硬盘、GPU,希望在默认情况下不调度该节点,只有配置了污点容忍才允许分配
003 基于Taint的驱逐
给节点添加污点
格式:kubectl taint node [node] key=value:[effect]
其中[effect] 可取值:
• NoSchedule :一定不能被调度
• PreferNoSchedule:尽量不要调度,非必须配置容忍
• NoExecute:不仅不会调度,还会驱逐Node上已有的Pod
#添加污点
[root@k8smaster pod]# kubectl taint node k8snode1 gpu=yes:NoSchedule
node/k8snode1 tainted
#查看污点
[root@k8smaster pod]# kubectl describe node k8snode1 |grep Taint
Taints: gpu=yes:NoSchedule
#去掉污点:
kubectl taint node [node] key:[effect]-
[root@k8smaster pod]# kubectl taint node k8snode1 gpu-
node/k8snode1 untainted
[root@k8smaster pod]# kubectl describe node k8snode1 |grep Taint
Taints: <none>
示例:测试污点及容忍污点
apiVersion: v1
kind: Pod
metadata:
name: pod2
spec:
containers:
- name: web
image: nginx
-------------------------------------------------------------------------------------
#无法分配
[root@k8smaster pod]# kubectl get pod
NAME READY STATUS RESTARTS AGE
pod2 0/1 Pending 0 7s
[root@k8smaster pod]# kubectl describe pod pod2
0/2 nodes are available: 1 node(s) had taint {gpu: yes}, that the pod didn't tolerate, 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
#配置污点容忍
apiVersion: v1
kind: Pod
metadata:
name: pod3
spec:
tolerations:
- key: "gpu"
operator: "Equal"
value: "nvidia"
effect: "NoSchedule"
containers:
- name: web
image: nginx
---------------------------------------------------------------------
#无法分配,因为值不匹配
[root@k8smaster pod]# kubectl get pod
NAME READY STATUS RESTARTS AGE
pod3 0/1 Pending 0 4s
#配置污点容忍
apiVersion: v1
kind: Pod
metadata:
name: pod5
spec:
tolerations:
- key: "gpu"
operator: "Equal"
value: "yes"
effect: "NoSchedule"
containers:
- name: web
image: nginx
-------------------------------------------------------------------
[root@k8smaster pod]# kubectl apply -f pod5.yaml
pod/pod5 created
#成功分配
[root@k8smaster pod]# kubectl get pod
NAME READY STATUS RESTARTS AGE
pod5 1/1 Running 0 29s
nodeName
相关概念
001 指定节点名称,用于将Pod调度到指定的Node上,不经过调度器
002 即使节点上有污点,有可以调度
示例
apiVersion: v1
kind: Pod
metadata:
name: pod6
spec:
nodeName: k8snode1
containers:
- name: web
image: nginx
-------------------------------------------------------------
#k8snode1带有污点
[root@k8smaster pod]# kubectl describe node k8snode1 |grep Taint
Taints: gpu=yes:NoSchedule
[root@k8smaster pod]# kubectl apply -f pod6.yaml
pod/pod6 created
#成功分配
[root@k8smaster pod]# kubectl get pod
NAME READY STATUS RESTARTS AGE
pod6 1/1 Running 0 5s
网友评论