环境说明

本实例中以一个master node,两个work node作为演示
k8s 套件版本为当前最稳定的 v1.17.5
docker 版本为 19.03.12

  1. CPU核数最少2个
  2. 内存最少2G
  3. 配置静态IP地址,配置文件中网关必须写
  4. 需关闭swap空间
角色 主机名 IP 网关
master node k8s-offline-master01 192.168.231.30 192.168.231.2
work node k8s-offline-node01 192.168.231.31 192.168.231.2
work node k8s-offline-node02 192.168.231.32 192.168.231.2

获取安装包

下面提供了两种离线安装包下载方式,使用任一方式即可
# 方式一
wget  http://101.201.81.45:8888/CentOS_6.5/zcbus/docker/k8sOfflineSetup-v1.17.5.tar.gz
#方式二
链接:https://pan.baidu.com/s/19rzKmikM_yEjQHYAfzxU8g  密码:lc5g

安装包中包含:

  1. docker 离线包
  2. kubeadm、kubelet、kubectl 二进制文件
  3. master node 依赖docker镜像
  4. work node 依赖docker镜像
  5. flannel 网络插件yaml文件及docker镜像

一、 上传安装包

上传安装包到指定目录,并解压

tar -xzvf k8sOfflineSetup-v1.17.5.tar.gz

二、离线装docker(所有节点)

docker 离线包在 kubernetes/docker 目录中,版本为19.03.2

# 1、解压
cd kubernetes/docker
tar -xzvf docker-19.03.12.tgz 
# 2、将解压出来的docker文件内容移动到 /usr/bin/ 目录下
cp docker/* /usr/bin/
# 3、将docker注册为service
# 注意:将--insecure-registry=192.168.200.128 此处改为你自己服务器ip
cat > /etc/systemd/system/docker.service << EOF
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target firewalld.service
Wants=network-online.target
[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd --selinux-enabled=false --insecure-registry=192.168.231.30 --exec-opt native.cgroupdriver=systemd
ExecReload=/bin/kill -s HUP 
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
# Uncomment TasksMax if your systemd version supports it.
# Only systemd 226 and above support this version.
#TasksMax=infinity
TimeoutStartSec=0
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes
# kill only the docker process, not all processes in the cgroup
KillMode=process
# restart the docker process if it exits prematurely
Restart=on-failure
StartLimitBurst=3
StartLimitInterval=60s
[Install]
WantedBy=multi-user.target
EOF
#添加文件权限并启动docker
chmod 777 /etc/systemd/system/docker.service 
#重载unit配置文件
systemctl daemon-reload
#启动Docker
systemctl start docker
# 设置开机自启
systemctl enable docker.service
# 查看状态
systemctl status docker
# 配置镜像加速(内网无需配置,配置也无法访问)
cat > /etc/docker/daemon.json <<EOF
{"registry-mirrors": ["http://hub-mirror.c.163.com"]}
EOF
# 重启
systemctl restart docker

三、安装k8s

1. 主机配置(所有节点)

设置主机名,关闭seline、swap等,所有主机均需设置

# 设置主机名
#1、给每一台机器设置主机名
hostnamectl set-hostname k8s-offline-master01
hostnamectl set-hostname k8s-offline-node01
hostnamectl set-hostname k8s-offline-node02
#查看主机名
hostname
#配置IP host映射关系
cat >> /etc/hosts << EOF
192.168.231.30 k8s-offline-master01
192.168.231.31 k8s-offline-node01
192.168.231.32 k8s-offline-node02
EOF
systemctl stop firewalld
systemctl disable firewalld
# 关闭selinux
sed -i 's/^SELINUX=enforcing$/SELINUX=disabled/' /etc/selinux/config && setenforce 0
#关闭swap
swapoff -a
yes | cp /etc/fstab /etc/fstab_bak
cat /etc/fstab_bak | grep -v swap > /etc/fstab
echo "* soft nofile 190000" >> /etc/security/limits.conf
echo "* hard nofile 200000" >> /etc/security/limits.conf
echo "* soft nproc 252144" >> /etc/security/limits.conf
echo "* hadr nproc 262144" >> /etc/security/limits.conf
# 配置内核参数
cat > /etc/sysctl.d/k8s.conf <<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.ip_forward = 1
vm.swappiness=0
EOF
sysctl --system

2. 安装kubeadm(所有节点)

在所有的主机安装kubelet、kubeadm、kubectl工具
安装包位置: kubernetes/packages

cd kubernetes/packages
cp -ar kubelet kubeadm kubectl /usr/bin/
# 配置kubeadm启动文件
cp -r kubelet.service* /usr/lib/systemd/system/ 
#导入cni-plugins
mv cni /opt/
#启动kubelet
systemctl enable kubelet && systemctl restart kubelet
#查看状态
systemctl status kubelet

3. master node 配置

在master节点到导入所需镜像

# 1. 导入k8s镜像
[root@k8s-offline-master01 ~]# cd kubernetes/images/
[root@k8s-offline-master01 images]# docker load < k8s-master-v1.17.5.tar 
# 查看
[root@k8s-offline-master01 images]# docker images 
REPOSITORY                             TAG                 IMAGE ID            CREATED             SIZE
k8s.gcr.io/kube-proxy                  v1.17.5             e13db435247d        4 months ago        116MB
k8s.gcr.io/kube-apiserver              v1.17.5             f640481f6db3        4 months ago        171MB
k8s.gcr.io/kube-controller-manager     v1.17.5             fe3d691efbf3        4 months ago        161MB
k8s.gcr.io/kube-scheduler              v1.17.5             f648efaff966        4 months ago        94.4MB
quay-mirror.qiniu.com/coreos/flannel   v0.12.0-amd64       4e9f801d2217        5 months ago        52.8MB
k8s.gcr.io/coredns                     1.6.5               70f311871ae1        10 months ago       41.6MB
k8s.gcr.io/etcd                        3.4.3-0             303ce5db0e90        10 months ago       288MB
k8s.gcr.io/pause                       3.1                 da86e6ba6ca1        2 years ago         742kB

创建初始化配置文件,使用kubeadm config print命令生成初始化配置文件(修改相应配置文件)

修改四个地方:

  1. advertiseAddress: 0.0.0.0 IP地址修改为当前master主机ip
  2. kubernetesVersion: v1.17.0 修改为 v1.17.5 ,修改为与当前导入的镜像相同的版本,一定要与拉取的版本相对应,否则会重新从镜像库拉取镜像。
  3. 添加flannel网络配置:
    podSubnet: “10.244.0.0/16” 一定要配置,否则后面flannel网络会出现不断重启的问题
  4. 使用ipvs路由,最后添加
[root@k8s-offline-master01 yaml]# kubeadm config print init-defaults > kubeadm-config.yaml
W0906 19:13:52.340400    2265 validation.go:28] Cannot validate kube-proxy config - no validator is available
W0906 19:13:52.340456    2265 validation.go:28] Cannot validate kubelet config - no validator is available
# 修改配置文件,如下
[root@k8s-offline-master01 yaml]# cat kubeadm-config.yaml 
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.231.30   # 修改为你的master 主机ip
  bindPort: 6443
nodeRegistration:
  criSocket: /var/run/dockershim.sock
  name: k8s-offline-master01
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.17.5  # 修改为与当前导入的镜像相同的版本,一定要与拉取的版本相对应,否则会重新从镜像库拉取镜像。
networking:
  dnsDomain: cluster.local
  podSubnet: "10.244.0.0/16"  # flannel网络配置,一定要配置,否则后面flannel网络会出现不断重启的问题
  serviceSubnet: 10.96.0.0/12
scheduler: {}
---
# 配置使用ipvs路由
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: kubeProxyConfiguration
featureGates:
  SupportIPVSProxyMode: true
mode: ipvs

初始化Master01节点

这里追加tee命令将初始化日志输出到kubeadm-init.log中以备用

kubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.log  
[root@k8s-offline-master01 yaml]# kubectl get node 
NAME                   STATUS     ROLES    AGE   VERSION
k8s-offline-master01   NotReady   master   15m   v1.17.5

4. work node 配置

在所有的node节点导入node所需要的docker镜像

[root@k8s-offline-node01 kubernetes]# cd images/
[root@k8s-offline-node01 images]# docker load < k8s-node-v1.17.5.tar 
[root@k8s-offline-node01 images]# docker images
REPOSITORY                             TAG                 IMAGE ID            CREATED             SIZE
k8s.gcr.io/kube-proxy                  v1.17.5             e13db435247d        4 months ago        116MB
quay-mirror.qiniu.com/coreos/flannel   v0.12.0-amd64       4e9f801d2217        5 months ago        52.8MB
k8s.gcr.io/pause                       3.1                 da86e6ba6ca1        2 years ago         742kB

节点加入集群

先清理环境,然后再kubeadm join (Node节点上执行)

获取加入k8s集群的命令有两种

  1. 查看master初始化日志
  2. 使用kubeadm token create –print-join-command获取
# 从master初始化日志中查看添加节点命令
[root@k8s-offline-master01 yaml]# cat kubeadm-init.log 
...
kubeadm join 192.168.231.30:6443 --token abcdef.0123456789abcdef \
   --discovery-token-ca-cert-hash sha256:9fa74759190dd58af46fc6ff7e807e68673472c0c0039bc3ee0f4240c43721fd 
# 上面的token有效期24 小时    
# 默认token的有效期为24小时,当过期之后,该token就不可用了,在master节点上执行 kubeadm token create
# 或者使用下面的命令获取加入节点的命令,在master节点执行
kubeadm token create --print-join-command
[root@k8s-offline-master01 yaml]# kubeadm token create --print-join-command 
kubeadm join 192.168.231.30:6443 --token 2qg8vf.17xe7vytsnhaed5a     --discovery-token-ca-cert-hash sha256:9fa74759190dd58af46fc6ff7e807e68673472c0c0039bc3ee0f4240c43721fd 
# 在node节点上执行上面的命令
[root@k8s-offline-node01 ~]# kubeadm join 192.168.231.30:6443 --token 2qg8vf.17xe7vytsnhaed5a     --discovery-token-ca-cert-hash sha256:9fa74759190dd58af46fc6ff7e807e68673472c0c0039bc3ee0f4240c43721fd 
W0907 03:09:15.899230    1824 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
[preflight] Running pre-flight checks
       [WARNING FileExisting-socat]: socat not found in system path
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
W0907 03:09:17.202173    1824 common.go:148] WARNING: could not obtain a bind address for the API Server: no default routes found in "/proc/net/route" or "/proc/net/ipv6_route"; using: 0.0.0.0
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.17" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
# 查看
[root@k8s-offline-master01 yaml]# kubectl get nodes
NAME                   STATUS     ROLES    AGE   VERSION
k8s-offline-master01   NotReady   master   44m   v1.17.5
k8s-offline-node01     NotReady   <none>   17s   v1.17.5

5. 安装flannel网络插件

集群间不同主机间pod通信需要安装网络插件,flannel、calico、Weave等,这里我们选用flannel

几种网络插件对比参考

# master 节点执行
[root@k8s-offline-master01 yaml]# kubectl apply -f kube-flannel.yml 
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds-amd64 created
daemonset.apps/kube-flannel-ds-arm64 created
daemonset.apps/kube-flannel-ds-arm created
daemonset.apps/kube-flannel-ds-ppc64le created
daemonset.apps/kube-flannel-ds-s390x created

四、问题

在部署过程中可能遇到的一些问题,但是只要严格按照以上步骤进行部署不会遇到什么问题。

1. kubernetes 初始化之前查看kubelet的状态异常

这个在master初始化之前,kubelet不会正常启动,在导入镜像初始化完成后,kubelet正常。work node节点也是一样的。


# kubernetes 初始化之前查看kubelet的状态异常
[root@k8s-offline-master01 yaml]# systemctl status kubelet.service   
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: activating (auto-restart) (Result: exit-code) since Mon 2020-09-07 02:05:24 CST; 486ms ago
     Docs: https://kubernetes.io/docs/
  Process: 9436 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=255)
 Main PID: 9436 (code=exited, status=255)
Sep 07 02:05:24 k8s-offline-master01 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
Sep 07 02:05:24 k8s-offline-master01 systemd[1]: Unit kubelet.service entered failed state.
Sep 07 02:05:24 k8s-offline-master01 systemd[1]: kubelet.service failed.
# 执行查看详细日志
journalctl -xefu kubelet 
# 执行 kubeadm init  后,kubelet状态正常
[root@k8s-offline-master01 ~]# systemctl status kubelet.service 
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Mon 2020-09-07 02:26:33 CST; 1min 59s ago
     Docs: https://kubernetes.io/docs/
 Main PID: 13440 (kubelet)
    Tasks: 16
   Memory: 30.4M
   CGroup: /system.slice/kubelet.service
           └─13440 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config...
Sep 07 02:28:08 k8s-offline-master01 kubelet[13440]: W0907 02:28:08.855007   13440 cni.go:237] Unable to update cni config: no networks found...i/net.d
Sep 07 02:28:09 k8s-offline-master01 kubelet[13440]: E0907 02:28:09.686755   13440 kubelet.go:2183] Container runtime network not ready: Netw...ialized
Sep 07 02:28:13 k8s-offline-master01 kubelet[13440]: W0907 02:28:13.856195   13440 cni.go:237] Unable to update cni config: no networks found...i/net.d
Sep 07 02:28:14 k8s-offline-master01 kubelet[13440]: E0907 02:28:14.693244   13440 kubelet.go:2183] Container runtime network not ready: Netw...ialized
Hint: Some lines were ellipsized, use -l to show in full.

2. kubeadm init 镜像已经离线导入,但是依然要连接镜像仓库下载镜像

k8s这个问题就比较坑人,通过命令生成的配置文件版本为 v1.17.0,而我们使用的kubeadm、k8s镜像均为 v1.17.5 ,因此在初始化时,本地找不到v1.17.0的k8s镜像。

[root@k8s-offline-master01 yaml]# kubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.log  
W0907 02:25:03.765048   12114 strict.go:47] unknown configuration schema.GroupVersionKind{Group:"kubeproxy.config.k8s.io", Version:"v1alpha1", Kind:"kubeProxyConfiguration"} for scheme definitions in "k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/scheme/scheme.go:31" and "k8s.io/kubernetes/cmd/kubeadm/app/componentconfigs/scheme.go:28"
W0907 02:25:03.765245   12114 validation.go:28] Cannot validate kubelet config - no validator is available
W0907 02:25:03.765255   12114 validation.go:28] Cannot validate kube-proxy config - no validator is available
[config] WARNING: Ignored YAML document with GroupVersionKind kubeproxy.config.k8s.io/v1alpha1, Kind=kubeProxyConfiguration
[init] Using Kubernetes version: v1.17.0
[preflight] Running pre-flight checks
        [WARNING FileExisting-socat]: socat not found in system path
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-apiserver:v1.17.0: output: Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:33440->[::1]:53: read: connection refused
, error: exit status 1
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-controller-manager:v1.17.0: output: Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:51817->[::1]:53: read: connection refused
, error: exit status 1
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-scheduler:v1.17.0: output: Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:40228->[::1]:53: read: connection refused
, error: exit status 1
        [ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-proxy:v1.17.0: output: Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:57106->[::1]:53: read: connection refused
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

处理

# 通过下面命令生成配置文件
kubeadm config print init-defaults > kubeadm-config.yaml
...
kubernetesVersion: v1.17.0
...
版本号为v1.17.0kubeadm和导入的镜像版本为v1.17.5k8s会通过kubernetesVersion识别当前需要的镜像版本。
# 解决:
修改
kubernetesVersion: v1.17.5
#切记不要修改默认的镜像仓库路径。

3. kubectl 报端口号被拒绝

问题如下:

[k8s@k8s-offline-master01 ~]$ kubectl get node
The connection to the server localhost:8080 was refused - did you specify the right host or port?

解决:

# 没有执行下面的命令之前,kubectl不可用。
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
或者
[root@httk3 ~]# ll /etc/kubernetes/admin.conf
-rw------- 1 root root 5449 Dec  6 18:42 /etc/kubernetes/admin.conf
[root@httk3 ~]# echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile
[root@httk3 yaml]# source ~/.bash_profile
[root@httk3 yaml]# kubectl get node
NAME    STATUS     ROLES    AGE   VERSION
httk3   NotReady   master   30m   v1.17.5
其他节点有类似情况,可以通过scp方式把admin.conf传到/etc/kubernetes/目录

4. 添加网络插件

这个问题也是比较经典,一开始自己的测试环境在网络配置文件ifcfg-en33中只配了IP地址,没有配置网关地址。但是flannel机制是先检查有没有显式指定网关地址,如果没有则使用默认网关,而如果没有配置网关地址,flannel就无法正常启动。详见源码。

Normal SandboxChanged 113s (x12 over 2m27s) kubelet, k8s-offline-node01 Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 113s (x5 over 118s) kubelet, k8s-offline-node01 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod “kube-flannel-ds-amd64-5xnxj”: Error response from daemon: all SubConns are in TransientFailure, latest connection error: connection error: desc = “transport: Error while dialing dial unix /var/run/docker/containerd/containerd.sock: connect: connection refused”: unavailable

[root@k8s-offline-master01 yaml]# kubectl logs -f kube-flannel-ds-amd64-8vhf6 -n kube-system 
I0906 20:33:56.654980       1 main.go:518] Determining IP address of default interface
E0906 20:33:56.655099       1 main.go:204] Failed to find any valid interface to use: failed to get default interface: Unable to find default route
[root@k8s-offline-master01 yaml]#

解决:

work node没有配置网关地址
NETMASK=255.255.255.0
GATEWAY=192.168.231.2
源码:
...
 } else {// 没有指定 由flannel自行决定
        log.Info("Determining IP address of default interface")
        // linux   通过route信息获取gateway所在网卡
        // windows 通过netsh命令行获取信息netsh interface ipv4 show addresses
        if iface, err = ip.GetDefaultGatewayIface(); err != nil {
            return nil, fmt.Errorf("failed to get default interface: %s", err)
        }
...
# 参考:
https://blog.csdn.net/xxb249/article/details/86422165
文档更新时间: 2020-12-28 20:56   作者:admin