Building High-Availability External etcd Cluster with Static Pods: Step-by-Step(En)

8 min readNov 14, 2023

From the official documentation, it is evident that when establishing a high-availability Kubernetes cluster, there are two approaches regarding ETCD. One involves sharing ETCD with the master nodes, while the other separates ETCD into a dedicated cluster of three nodes for operation.

This article will explain how to set up an external ETCD cluster, operate and manage it using Static Pods, and finally integrate it with a Kubernetes cluster.

This article will cover the following:

Why Establish an External ETCD?
Why Use the Static Pod Approach?
Implementation
Integration with Kubernetes Cluster Setup
Conclusion

Let’s get started!

1. Why Establish an External ETCD?

If ETCD is separated, the architecture will look as illustrated in the diagram below:

By separating the master and ETCD members, this setup ensures that if any master or ETCD member encounters an issue, it won’t heavily impact the overall system. It’s a way to minimize the consequences of potential problems in either the master or ETCD components.

However, keep in mind that this configuration comes with a trade-off. It requires more nodes (6 in total, including master and ETCD), which means higher resource demands. Additionally, because ETCD and Master communicate over the network, there’s a potential for some risks. So, it’s essential to factor in these considerations during your planning.

2. Why Use the Static Pod Approach?

A Static Pod refers to a Pod running on a specified node, directly managed by Kubelet without going through the apiserver. This differs from the typical Pod management approach (e.g., using Deployments). From the perspective of the Kubernetes apiserver, a Static Pod is visible but cannot be managed.

Kubelet manages Static Pods by:

Handling the restarts after a Static Pod crashes.
Rebinding the restarted Static Pod to the Kubelet of the specified node.
Automatically creating a mirror pod for each static pod through Kubernetes apiserver after the restart.

Why would I use the Static Pod method to establish an external ETCD?

The answer is to align with the original type of ETCD within Kubernetes. How so? Here are common examples of Static Pods in Kubernetes:

etcd
kube-apiserver
kube-controller-manager
kube-scheduler

As mentioned above, if the architecture follows a stacked approach, it implies that ETCD within Kubernetes is also running in the form of Static Pods. By separating them today and continuing to run them in their original form, it ensures a more consistent and straightforward integration and management process in the future.

3. Implementation

Next, let’s move on to the implementation part. The following explains how to set up ETCD on three independent nodes using the static pod approach.

(Step 0). Preparing for the Setup

#----------------------------------------------
# S3-1. Turn off swap (all nodes)
#----------------------------------------------
[etcdX]# swapoff -a && sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

#---------------------------------------------
# S3-2. Install iproute-tc (all nodes)
#---------------------------------------------
[etcdX]# yum install iproute-tc
[etcdX]# vim /etc/modules-load.d/k8s.conf
overlay
br_netfilter

[etcdX]# modprobe overlay
[etcdX]# modprobe br_netfilter

[etcdX]# vim /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.ipv4.ip_forward                 = 1
net.bridge.bridge-nf-call-ip6tables = 1

[etcdX]# sysctl --system

(Step 1). Installing Essential Software

#---------------------------------------------
# S3-3. Install runtime (all nodes)
#---------------------------------------------
[etcdX]# export VERSION=1.27
[etcdX]# curl -L -o /etc/yum.repos.d/devel:kubic:libcontainers:stable.repo https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/CentOS_8/devel:kubic:libcontainers:stable.repo
[etcdX]# curl -L -o /etc/yum.repos.d/devel:kubic:libcontainers:stable:cri-o:$VERSION.repo https://download.opensuse.org/repositories/devel:kubic:libcontainers:stable:cri-o:$VERSION/CentOS_8/devel:kubic:libcontainers:stable:cri-o:$VERSION.repo
[etcdX]# yum install cri-o -y
[etcdX]# systemctl enable --now crio
[etcdX]# crio version
[etcdX]# yum list cri-o --showduplicates|sort -r > crio.version

#---------------------------------------------
# S3-4. kubelet, kubeadm, kubectl (all nodes)
#---------------------------------------------
[etcdX]# vim /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg

[etcdX]# yum clean all ; yum repolist

[etcdX]# yum list kubelet --showduplicates|sort -r > kubelet.version
[etcdX]# yum list kubeadm --showduplicates|sort -r > kubeadm.version
[etcdX]# yum list kubectl --showduplicates|sort -r > kubectl.version

[etcdX]# yum install kubelet-1.27.6-0 kubeadm-1.27.6-0 kubectl-1.27.6-0

#---------------------------------------------
# S3-5. Creating systemd Service (All Nodes)
# Since we are not yet establishing the Kubernetes cluster, 
# attempting to start with the original kubelet.conf might lead to issues. 
# We need to create a new service startup file.
#---------------------------------------------
[etcdX]# vim /etc/crio/crio.conf
=> cgroup_manager = "systemd"
[etcdX]# cat /etc/crio/crio.conf | grep cgroup_manager
[etcdX]# systemctl restart crio

[etcdX]# cat << EOF > /usr/lib/systemd/system/kubelet.service.d/20-etcd-service-manager.conf
[Service]
ExecStart=
ExecStart=/usr/bin/kubelet --address=127.0.0.1 --pod-manifest-path=/etc/kubernetes/manifests --cgroup-driver=systemd --runtime-request-timeout=15m --container-runtime-endpoint=unix:///var/run/crio/crio.sock
Restart=always
EOF

[etcdX]# systemctl daemon-reload
[etcdX]# systemctl restart kubelet

※ Important Notes:

The official path has been updated to /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf (for kubeadm 1.13.5 and later). Following the official document may result in kubelet startup issues.
For systemd, it’s necessary to switch to the cgroup used by CRI-O (/etc/crio/crio.conf) because kubelet defaults to cgroupfs.

#---------------------------------------------
# S3-6. Creating kubeadm config (all nodes)
#---------------------------------------------
[etcdX]# vim /root/kubeadm_setup.sh
#!/bin/bash
export HOST0=10.107.88.15
export HOST1=10.107.88.16
export HOST2=10.107.88.17

export NAME0="etcd01"
export NAME1="etcd02"
export NAME2="etcd03"

mkdir -p /tmp/${HOST0}/ /tmp/${HOST1}/ /tmp/${HOST2}/

HOSTS=(${HOST0} ${HOST1} ${HOST2})
NAMES=(${NAME0} ${NAME1} ${NAME2})

for i in "${!HOSTS[@]}"; do
HOST=${HOSTS[$i]}
NAME=${NAMES[$i]}

cat << EOF > /tmp/${HOST}/kubeadmcfg.yaml
---
apiVersion: "kubeadm.k8s.io/v1beta3"
kind: InitConfiguration
nodeRegistration:
 name: ${NAME}
localAPIEndpoint:
 advertiseAddress: ${HOST}
---
apiVersion: "kubeadm.k8s.io/v1beta3"
kind: ClusterConfiguration
etcd:
 local:
     serverCertSANs:
     - "${HOST}"
     peerCertSANs:
     - "${HOST}"
     extraArgs:
         initial-cluster: ${NAMES[0]}=https://${HOSTS[0]}:2380,${NAMES[1]}=https://${HOSTS[1]}:2380,${NAMES[2]}=https://${HOSTS[2]}:2380
         initial-cluster-state: new
         name: ${NAME}
         listen-peer-urls: https://${HOST}:2380
         listen-client-urls: https://${HOST}:2379
         advertise-client-urls: https://${HOST}:2379
         initial-advertise-peer-urls: https://${HOST}:2380
EOF
done

[etcdX]# ./kubeadm_setup.sh
[etcdX]# tree /tmp/<etcd_ip>

(Step 2). Creating Certificates

#------------------------------------------------
# S3-7. Generating CA (etcd01)
#------------------------------------------------
[etcdX]# kubeadm init phase certs etcd-ca
[etcdX]# tree /etc/kubernetes/pki/etcd/
ca.crt
ca.key

#------------------------------------------------
# S3-8. Creating CA for Each Member (etcd01)
#------------------------------------------------
[etcdX]# export HOST0=10.107.88.15
[etcdX]# export HOST1=10.107.88.16
[etcdX]# export HOST2=10.107.88.17
[etcdX]# kubeadm init phase certs etcd-server --config=/tmp/${HOST2}/kubeadmcfg.yaml
[etcdX]# kubeadm init phase certs etcd-peer --config=/tmp/${HOST2}/kubeadmcfg.yaml
[etcdX]# kubeadm init phase certs etcd-healthcheck-client --config=/tmp/${HOST2}/kubeadmcfg.yaml
[etcdX]# kubeadm init phase certs apiserver-etcd-client --config=/tmp/${HOST2}/kubeadmcfg.yaml
[etcdX]# cp -R /etc/kubernetes/pki /tmp/${HOST2}/

# Deleting Non-Reusable CA
[etcdX]# find /etc/kubernetes/pki -not -name ca.crt -not -name ca.key -type f -delete
[etcdX]# tree /etc/kubernetes/pki/etcd
/etc/kubernetes/pki/etcd
├── ca.crt
└── ca.key
(Retaining Only the Earliest Generated CA)

[etcdX]# kubeadm init phase certs etcd-server --config=/tmp/${HOST1}/kubeadmcfg.yaml
[etcdX]# kubeadm init phase certs etcd-peer --config=/tmp/${HOST1}/kubeadmcfg.yaml
[etcdX]# kubeadm init phase certs etcd-healthcheck-client --config=/tmp/${HOST1}/kubeadmcfg.yaml
[etcdX]# kubeadm init phase certs apiserver-etcd-client --config=/tmp/${HOST1}/kubeadmcfg.yaml
[etcdX]# cp -R /etc/kubernetes/pki /tmp/${HOST1}/
[etcdX]# find /etc/kubernetes/pki -not -name ca.crt -not -name ca.key -type f -delete

[etcdX]# kubeadm init phase certs etcd-server --config=/tmp/${HOST0}/kubeadmcfg.yaml
[etcdX]# kubeadm init phase certs etcd-peer --config=/tmp/${HOST0}/kubeadmcfg.yaml
[etcdX]# kubeadm init phase certs etcd-healthcheck-client --config=/tmp/${HOST0}/kubeadmcfg.yaml
[etcdX]# kubeadm init phase certs apiserver-etcd-client --config=/tmp/${HOST0}/kubeadmcfg.yaml
=> For HOST0, no need to copy since its on the local machine

# Deleting CA That Should Not Be Copied from HOST0
[etcdX]# find /tmp/${HOST2} -name ca.key -type f -delete
[etcdX]# find /tmp/${HOST1} -name ca.key -type f -delete

#------------------------------------------------
# S3-9. copy to all other hosts (etcd01)
#------------------------------------------------
[etcdX]# USER=root
[etcdX]# HOST=${HOST1}
[etcdX]# scp -r /tmp/${HOST}/* ${USER}@${HOST}:
[etcdX]# ssh ${USER}@${HOST}
[etcdX]# chown -R root:root pki
[etcdX]# mv pki /etc/kubernetes/

#------------------------------------------------
# S3-10. Verify (Ensure paths are the same on each host)
#------------------------------------------------
[etcdX]# tree /root
[etcdX]# tree /etc/kubernetes/pki

(Step 3). Creating ETCD Cluster

#------------------------------------------------
# S3-11. Creating static pod manifest (etcd01/02/03)
#------------------------------------------------
[etcd01]# kubeadm init phase etcd local --config=/root/kubeadmcfg.yaml
[etcd02]# kubeadm init phase etcd local --config=/root/kubeadmcfg.yaml
[etcd03]# kubeadm init phase etcd local --config=/root/kubeadmcfg.yaml

#-------------------------------------------------
# S3-12. etcdctl installation
#-------------------------------------------------
[etcdX]# ETCD_RELEASE=$(curl -s https://api.github.com/repos/etcd-io/etcd/releases/latest|grep tag_name | cut -d '"' -f 4)
[etcdX]# echo $ETCD_RELEASE
v3.5.9

[etcdX]# wget https://github.com/etcd-io/etcd/releases/download/${ETCD_RELEASE}/etcd-${ETCD_RELEASE}-linux-amd64.tar.gz
[etcdX]# tar zxvf etcd-v3.5.9-linux-amd64.tar.gz
[etcdX]# cd etcd-v3.5.9-linux-amd64
[etcdX]# ls -al
[etcdX]# cp -rp etcdctl /usr/local/bin

#------------------------------------------------
# S3-13. Verify etcd cluster
#------------------------------------------------
[etcdX]# ETCDCTL_API=3 etcdctl \
--cert /etc/kubernetes/pki/etcd/peer.crt \
--key /etc/kubernetes/pki/etcd/peer.key \
--cacert /etc/kubernetes/pki/etcd/ca.crt \
--endpoints https://10.107.88.15:2379 endpoint health

https://10.107.88.15:2379 is healthy: successfully committed proposal: took = 10.08693ms
https://10.107.88.16:2379 is healthy: successfully committed proposal: took = 10.912799ms
https://10.107.88.17:2379 is healthy: successfully committed proposal: took = 10.461484ms

4. Integration with Kubernetes Cluster Setup

#------------------------------------------
# S4-1. Copy CA from any etcd to master01
#------------------------------------------
[etcd]# scp -rp /etc/kubernetes/pki/etcd/ca.crt root@master01:/root/etcd-ca
[etcd]# scp -rp /etc/kubernetes/pki/apiserver-etcd-client.crt root@master01:/root/etcd-ca
[etcd]# scp -rp /etc/kubernetes/pki/apiserver-etcd-client.key root@master01:/root/etcd-ca

#------------------------------------------
# S4-2. Configure kubeadm-config.yaml on master01
#------------------------------------------
[root]# vim kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: stable
apiServer:
   certSANs:
   - "test.example.poc"
controlPlaneEndpoint: "10.107.88.9:6443"
etcd:
       external:
              endpoints:
              - https://10.107.88.15:2379
              - https://10.107.88.16:2379
              - https://10.107.88.17:2379
              caFile: /etc/kubernetes/pki/etcd/ca.crt
              certFile: /etc/kubernetes/pki/apiserver-etcd-client.crt
              keyFile: /etc/kubernetes/pki/apiserver-etcd-client.key
networking:
    podSubnet: "10.244.0.0/16"

#------------------------------------------
# S4-3. Initialize master01
#------------------------------------------
[root]# kubeadm init --config kubeadm-config.yaml --upload-certs

5. Conclusion

The above is the procedure for separating ETCD from the Control Plane. In general, this architecture is less common in most scenarios. The prevalent approach involves a stacked architecture where ETCD is integrated with the Control Plane. This includes solutions used in enterprise environments (e.g., OpenShift), which typically adopt a stacked ETCD. Here are a few points of comparison between the stacked and external approaches:

[Stacked]: Due to being on the same node, communication between apiserver and etcd only requires loopback. Read operations can be performed without going through the master, and data can be read without relying on TCP/IP network, ensuring both speed and stability.
[External]: Communication requires TCP/IP network.
[Stacked]: ETCD prioritizes Disk I/O, and if running on the same node, the network won’t become a bottleneck for transmission.
[External]: Network bandwidth directly affects overall service performance.
[Stacked]: If a Control Plane member goes down, it means the ETCD on the same node will also be unable to provide services.
[External]: Even if a Master/ETCD member fails, it won’t directly cause an immediate system breakdown.
[Stacked]: Simple deployment, no additional resources needed.
[External]: Requires separate deployment, with additional resource requirements.

This article explained the process of establishing an External ETCD and integrating it into a Kubernetes Cluster deployment. It also briefly outlined the practical differences between the two approaches for your reference.

Thank you for reading this somewhat lengthy article.

Your claps is the driving force behind my continued sharing. See you in the next article.

References: