环境说明:
本实例中以一个master node,两个work node作为演示
k8s 套件版本为当前最稳定的 v1.17.5
docker 版本为 19.03.12
- CPU核数最少2个
- 内存最少2G
- 配置静态IP地址,配置文件中网关必须写
- 需关闭swap空间
角色 | 主机名 | IP | 网关 |
---|---|---|---|
master node | k8s-offline-master01 | 192.168.231.30 | 192.168.231.2 |
work node | k8s-offline-node01 | 192.168.231.31 | 192.168.231.2 |
work node | k8s-offline-node02 | 192.168.231.32 | 192.168.231.2 |
获取安装包
下面提供了两种离线安装包下载方式,使用任一方式即可
# 方式一
wget http://101.201.81.45:8888/CentOS_6.5/zcbus/docker/k8sOfflineSetup-v1.17.5.tar.gz
#方式二
链接:https://pan.baidu.com/s/19rzKmikM_yEjQHYAfzxU8g 密码:lc5g
安装包中包含:
- docker 离线包
- kubeadm、kubelet、kubectl 二进制文件
- master node 依赖docker镜像
- work node 依赖docker镜像
- flannel 网络插件yaml文件及docker镜像
一、 上传安装包
上传安装包到指定目录,并解压
tar -xzvf k8sOfflineSetup-v1.17.5.tar.gz
二、离线装docker(所有节点)
docker 离线包在 kubernetes/docker 目录中,版本为19.03.2
# 1、解压
cd kubernetes/docker
tar -xzvf docker-19.03.12.tgz
# 2、将解压出来的docker文件内容移动到 /usr/bin/ 目录下
cp docker/* /usr/bin/
# 3、将docker注册为service
# 注意:将--insecure-registry=192.168.200.128 此处改为你自己服务器ip
cat > /etc/systemd/system/docker.service << EOF
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target firewalld.service
Wants=network-online.target
[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd --selinux-enabled=false --insecure-registry=192.168.231.30 --exec-opt native.cgroupdriver=systemd
ExecReload=/bin/kill -s HUP
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
# Uncomment TasksMax if your systemd version supports it.
# Only systemd 226 and above support this version.
#TasksMax=infinity
TimeoutStartSec=0
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes
# kill only the docker process, not all processes in the cgroup
KillMode=process
# restart the docker process if it exits prematurely
Restart=on-failure
StartLimitBurst=3
StartLimitInterval=60s
[Install]
WantedBy=multi-user.target
EOF
#添加文件权限并启动docker
chmod 777 /etc/systemd/system/docker.service
#重载unit配置文件
systemctl daemon-reload
#启动Docker
systemctl start docker
# 设置开机自启
systemctl enable docker.service
# 查看状态
systemctl status docker
# 配置镜像加速(内网无需配置,配置也无法访问)
cat > /etc/docker/daemon.json <<EOF
{"registry-mirrors": ["http://hub-mirror.c.163.com"]}
EOF
# 重启
systemctl restart docker
三、安装k8s
1. 主机配置(所有节点)
设置主机名,关闭seline、swap等,所有主机均需设置
# 设置主机名
#1、给每一台机器设置主机名
hostnamectl set-hostname k8s-offline-master01
hostnamectl set-hostname k8s-offline-node01
hostnamectl set-hostname k8s-offline-node02
#查看主机名
hostname
#配置IP host映射关系
cat >> /etc/hosts << EOF
192.168.231.30 k8s-offline-master01
192.168.231.31 k8s-offline-node01
192.168.231.32 k8s-offline-node02
EOF
systemctl stop firewalld
systemctl disable firewalld
# 关闭selinux
sed -i 's/^SELINUX=enforcing$/SELINUX=disabled/' /etc/selinux/config && setenforce 0
#关闭swap
swapoff -a
yes | cp /etc/fstab /etc/fstab_bak
cat /etc/fstab_bak | grep -v swap > /etc/fstab
echo "* soft nofile 190000" >> /etc/security/limits.conf
echo "* hard nofile 200000" >> /etc/security/limits.conf
echo "* soft nproc 252144" >> /etc/security/limits.conf
echo "* hadr nproc 262144" >> /etc/security/limits.conf
# 配置内核参数
cat > /etc/sysctl.d/k8s.conf <<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.ip_forward = 1
vm.swappiness=0
EOF
sysctl --system
2. 安装kubeadm(所有节点)
在所有的主机安装kubelet、kubeadm、kubectl工具
安装包位置: kubernetes/packages
cd kubernetes/packages
cp -ar kubelet kubeadm kubectl /usr/bin/
# 配置kubeadm启动文件
cp -r kubelet.service* /usr/lib/systemd/system/
#导入cni-plugins
mv cni /opt/
#启动kubelet
systemctl enable kubelet && systemctl restart kubelet
#查看状态
systemctl status kubelet
3. master node 配置
在master节点到导入所需镜像
# 1. 导入k8s镜像
[root@k8s-offline-master01 ~]# cd kubernetes/images/
[root@k8s-offline-master01 images]# docker load < k8s-master-v1.17.5.tar
# 查看
[root@k8s-offline-master01 images]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
k8s.gcr.io/kube-proxy v1.17.5 e13db435247d 4 months ago 116MB
k8s.gcr.io/kube-apiserver v1.17.5 f640481f6db3 4 months ago 171MB
k8s.gcr.io/kube-controller-manager v1.17.5 fe3d691efbf3 4 months ago 161MB
k8s.gcr.io/kube-scheduler v1.17.5 f648efaff966 4 months ago 94.4MB
quay-mirror.qiniu.com/coreos/flannel v0.12.0-amd64 4e9f801d2217 5 months ago 52.8MB
k8s.gcr.io/coredns 1.6.5 70f311871ae1 10 months ago 41.6MB
k8s.gcr.io/etcd 3.4.3-0 303ce5db0e90 10 months ago 288MB
k8s.gcr.io/pause 3.1 da86e6ba6ca1 2 years ago 742kB
创建初始化配置文件,使用kubeadm config print
命令生成初始化配置文件(修改相应配置文件)
修改四个地方:
- advertiseAddress: 0.0.0.0 IP地址修改为当前master主机ip
- kubernetesVersion: v1.17.0 修改为 v1.17.5 ,修改为与当前导入的镜像相同的版本,一定要与拉取的版本相对应,否则会重新从镜像库拉取镜像。
- 添加flannel网络配置:
podSubnet: “10.244.0.0/16” 一定要配置,否则后面flannel网络会出现不断重启的问题- 使用ipvs路由,最后添加
[root@k8s-offline-master01 yaml]# kubeadm config print init-defaults > kubeadm-config.yaml
W0906 19:13:52.340400 2265 validation.go:28] Cannot validate kube-proxy config - no validator is available
W0906 19:13:52.340456 2265 validation.go:28] Cannot validate kubelet config - no validator is available
# 修改配置文件,如下
[root@k8s-offline-master01 yaml]# cat kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 192.168.231.30 # 修改为你的master 主机ip
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock
name: k8s-offline-master01
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.17.5 # 修改为与当前导入的镜像相同的版本,一定要与拉取的版本相对应,否则会重新从镜像库拉取镜像。
networking:
dnsDomain: cluster.local
podSubnet: "10.244.0.0/16" # flannel网络配置,一定要配置,否则后面flannel网络会出现不断重启的问题
serviceSubnet: 10.96.0.0/12
scheduler: {}
---
# 配置使用ipvs路由
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: kubeProxyConfiguration
featureGates:
SupportIPVSProxyMode: true
mode: ipvs
初始化Master01节点
这里追加tee命令将初始化日志输出到kubeadm-init.log中以备用
kubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.log
[root@k8s-offline-master01 yaml]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-offline-master01 NotReady master 15m v1.17.5
4. work node 配置
在所有的node节点导入node所需要的docker镜像
[root@k8s-offline-node01 kubernetes]# cd images/
[root@k8s-offline-node01 images]# docker load < k8s-node-v1.17.5.tar
[root@k8s-offline-node01 images]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
k8s.gcr.io/kube-proxy v1.17.5 e13db435247d 4 months ago 116MB
quay-mirror.qiniu.com/coreos/flannel v0.12.0-amd64 4e9f801d2217 5 months ago 52.8MB
k8s.gcr.io/pause 3.1 da86e6ba6ca1 2 years ago 742kB
节点加入集群
先清理环境,然后再kubeadm join (Node节点上执行)
获取加入k8s集群的命令有两种
- 查看master初始化日志
- 使用kubeadm token create –print-join-command获取
# 从master初始化日志中查看添加节点命令
[root@k8s-offline-master01 yaml]# cat kubeadm-init.log
...
kubeadm join 192.168.231.30:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:9fa74759190dd58af46fc6ff7e807e68673472c0c0039bc3ee0f4240c43721fd
# 上面的token有效期24 小时
# 默认token的有效期为24小时,当过期之后,该token就不可用了,在master节点上执行 kubeadm token create
# 或者使用下面的命令获取加入节点的命令,在master节点执行
kubeadm token create --print-join-command
[root@k8s-offline-master01 yaml]# kubeadm token create --print-join-command
kubeadm join 192.168.231.30:6443 --token 2qg8vf.17xe7vytsnhaed5a --discovery-token-ca-cert-hash sha256:9fa74759190dd58af46fc6ff7e807e68673472c0c0039bc3ee0f4240c43721fd
# 在node节点上执行上面的命令
[root@k8s-offline-node01 ~]# kubeadm join 192.168.231.30:6443 --token 2qg8vf.17xe7vytsnhaed5a --discovery-token-ca-cert-hash sha256:9fa74759190dd58af46fc6ff7e807e68673472c0c0039bc3ee0f4240c43721fd
W0907 03:09:15.899230 1824 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
[preflight] Running pre-flight checks
[WARNING FileExisting-socat]: socat not found in system path
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
W0907 03:09:17.202173 1824 common.go:148] WARNING: could not obtain a bind address for the API Server: no default routes found in "/proc/net/route" or "/proc/net/ipv6_route"; using: 0.0.0.0
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.17" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
# 查看
[root@k8s-offline-master01 yaml]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-offline-master01 NotReady master 44m v1.17.5
k8s-offline-node01 NotReady <none> 17s v1.17.5
5. 安装flannel网络插件
集群间不同主机间pod通信需要安装网络插件,flannel、calico、Weave等,这里我们选用flannel
# master 节点执行
[root@k8s-offline-master01 yaml]# kubectl apply -f kube-flannel.yml
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds-amd64 created
daemonset.apps/kube-flannel-ds-arm64 created
daemonset.apps/kube-flannel-ds-arm created
daemonset.apps/kube-flannel-ds-ppc64le created
daemonset.apps/kube-flannel-ds-s390x created
四、问题
在部署过程中可能遇到的一些问题,但是只要严格按照以上步骤进行部署不会遇到什么问题。
1. kubernetes 初始化之前查看kubelet的状态异常
这个在master初始化之前,kubelet不会正常启动,在导入镜像初始化完成后,kubelet正常。work node节点也是一样的。
# kubernetes 初始化之前查看kubelet的状态异常
[root@k8s-offline-master01 yaml]# systemctl status kubelet.service
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since Mon 2020-09-07 02:05:24 CST; 486ms ago
Docs: https://kubernetes.io/docs/
Process: 9436 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=255)
Main PID: 9436 (code=exited, status=255)
Sep 07 02:05:24 k8s-offline-master01 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
Sep 07 02:05:24 k8s-offline-master01 systemd[1]: Unit kubelet.service entered failed state.
Sep 07 02:05:24 k8s-offline-master01 systemd[1]: kubelet.service failed.
# 执行查看详细日志
journalctl -xefu kubelet
# 执行 kubeadm init 后,kubelet状态正常
[root@k8s-offline-master01 ~]# systemctl status kubelet.service
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Mon 2020-09-07 02:26:33 CST; 1min 59s ago
Docs: https://kubernetes.io/docs/
Main PID: 13440 (kubelet)
Tasks: 16
Memory: 30.4M
CGroup: /system.slice/kubelet.service
└─13440 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config...
Sep 07 02:28:08 k8s-offline-master01 kubelet[13440]: W0907 02:28:08.855007 13440 cni.go:237] Unable to update cni config: no networks found...i/net.d
Sep 07 02:28:09 k8s-offline-master01 kubelet[13440]: E0907 02:28:09.686755 13440 kubelet.go:2183] Container runtime network not ready: Netw...ialized
Sep 07 02:28:13 k8s-offline-master01 kubelet[13440]: W0907 02:28:13.856195 13440 cni.go:237] Unable to update cni config: no networks found...i/net.d
Sep 07 02:28:14 k8s-offline-master01 kubelet[13440]: E0907 02:28:14.693244 13440 kubelet.go:2183] Container runtime network not ready: Netw...ialized
Hint: Some lines were ellipsized, use -l to show in full.
2. kubeadm init 镜像已经离线导入,但是依然要连接镜像仓库下载镜像
k8s这个问题就比较坑人,通过命令生成的配置文件版本为 v1.17.0,而我们使用的kubeadm、k8s镜像均为 v1.17.5 ,因此在初始化时,本地找不到v1.17.0的k8s镜像。
[root@k8s-offline-master01 yaml]# kubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.log
W0907 02:25:03.765048 12114 strict.go:47] unknown configuration schema.GroupVersionKind{Group:"kubeproxy.config.k8s.io", Version:"v1alpha1", Kind:"kubeProxyConfiguration"} for scheme definitions in "k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/scheme/scheme.go:31" and "k8s.io/kubernetes/cmd/kubeadm/app/componentconfigs/scheme.go:28"
W0907 02:25:03.765245 12114 validation.go:28] Cannot validate kubelet config - no validator is available
W0907 02:25:03.765255 12114 validation.go:28] Cannot validate kube-proxy config - no validator is available
[config] WARNING: Ignored YAML document with GroupVersionKind kubeproxy.config.k8s.io/v1alpha1, Kind=kubeProxyConfiguration
[init] Using Kubernetes version: v1.17.0
[preflight] Running pre-flight checks
[WARNING FileExisting-socat]: socat not found in system path
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-apiserver:v1.17.0: output: Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:33440->[::1]:53: read: connection refused
, error: exit status 1
[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-controller-manager:v1.17.0: output: Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:51817->[::1]:53: read: connection refused
, error: exit status 1
[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-scheduler:v1.17.0: output: Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:40228->[::1]:53: read: connection refused
, error: exit status 1
[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-proxy:v1.17.0: output: Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:57106->[::1]:53: read: connection refused
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
处理
# 通过下面命令生成配置文件
kubeadm config print init-defaults > kubeadm-config.yaml
...
kubernetesVersion: v1.17.0
...
版本号为v1.17.0
而kubeadm和导入的镜像版本为v1.17.5,k8s会通过kubernetesVersion识别当前需要的镜像版本。
# 解决:
修改
kubernetesVersion: v1.17.5
#切记不要修改默认的镜像仓库路径。
3. kubectl 报端口号被拒绝
问题如下:
[k8s@k8s-offline-master01 ~]$ kubectl get node
The connection to the server localhost:8080 was refused - did you specify the right host or port?
解决:
# 没有执行下面的命令之前,kubectl不可用。
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
或者
[root@httk3 ~]# ll /etc/kubernetes/admin.conf
-rw------- 1 root root 5449 Dec 6 18:42 /etc/kubernetes/admin.conf
[root@httk3 ~]# echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile
[root@httk3 yaml]# source ~/.bash_profile
[root@httk3 yaml]# kubectl get node
NAME STATUS ROLES AGE VERSION
httk3 NotReady master 30m v1.17.5
其他节点有类似情况,可以通过scp方式把admin.conf传到/etc/kubernetes/目录
4. 添加网络插件
这个问题也是比较经典,一开始自己的测试环境在网络配置文件ifcfg-en33中只配了IP地址,没有配置网关地址。但是flannel机制是先检查有没有显式指定网关地址,如果没有则使用默认网关,而如果没有配置网关地址,flannel就无法正常启动。详见源码。
Normal SandboxChanged 113s (x12 over 2m27s) kubelet, k8s-offline-node01 Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 113s (x5 over 118s) kubelet, k8s-offline-node01 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod “kube-flannel-ds-amd64-5xnxj”: Error response from daemon: all SubConns are in TransientFailure, latest connection error: connection error: desc = “transport: Error while dialing dial unix /var/run/docker/containerd/containerd.sock: connect: connection refused”: unavailable
[root@k8s-offline-master01 yaml]# kubectl logs -f kube-flannel-ds-amd64-8vhf6 -n kube-system
I0906 20:33:56.654980 1 main.go:518] Determining IP address of default interface
E0906 20:33:56.655099 1 main.go:204] Failed to find any valid interface to use: failed to get default interface: Unable to find default route
[root@k8s-offline-master01 yaml]#
解决:
work node没有配置网关地址
NETMASK=255.255.255.0
GATEWAY=192.168.231.2
源码:
...
} else {// 没有指定 由flannel自行决定
log.Info("Determining IP address of default interface")
// linux 通过route信息获取gateway所在网卡
// windows 通过netsh命令行获取信息netsh interface ipv4 show addresses
if iface, err = ip.GetDefaultGatewayIface(); err != nil {
return nil, fmt.Errorf("failed to get default interface: %s", err)
}
...
# 参考:
https://blog.csdn.net/xxb249/article/details/86422165