ETCD Part2 : Basic operations(En)

Albert Weng
4 min readOct 5, 2023

This article builds upon Part 1 and explores some more advanced basic operations. If you haven’t read Part 1, you can find it here:

https://weng-albert.medium.com/etcd-backup-restore-part-1-2bb7f8dbb00a

Daily works

Next, we will cover the following topics: some practical insights, and finally, dealing with data inconsistency. In my opinion, I believe that every experience on this journey is valuable.

Continuous study is important, and learning from various situations can lead to better growth. (note: not quicker).

  • Installing CLI Tools and Setting Aliases
  • Basic Operations
  • Dealing with Data Inconsistencies

1. Installing CLI Tools and Setting Aliases

### Install ETCDCTL

[root]# ETCD_RELEASE=$(curl -s https://api.github.com/repos/etcd-io/etcd/releases/latest|grep tag_name | cut -d '"' -f 4)
[root]# echo $ETCD_RELEASE
v3.5.9

[root]# wget https://github.com/etcd-io/etcd/releases/download/${ETCD_RELEASE}/etcd-${ETCD_RELEASE}-linux-amd64.tar.gz
[root]# tar zxvf etcd-v3.5.9-linux-amd64.tar.gz
[root]# cd etcd-v3.5.9-linux-amd64
[root]# ls -al
[root]# cp -rp etcdctl /usr/local/bin
### Setting Up Aliases

[root]# vim /root/.bash_profile
alias etcdctl="ETCDCTL_API=3 /usr/local/bin/etcdctl \
--endpoints=10.107.88.15:2379,10.107.88.16:2379,10.107.88.17:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/apiserver-etcd-client.crt \
--key=/etc/kubernetes/pki/apiserver-etcd-client.key"

[root]# source /root/.bash_profile
  • endpoints: Points to port 2379 on the three external ETCD nodes.
  • cacert, cert, key: Refers to the certificates generated during ETCD setup

2. Basic Operations

### member list

[root]# etcdctl member list

4fc005cb2e00078b, started, etcd03, https://10.107.88.17:2380, https://10.107.88.17:2379, false
db6cb663afc3dec9, started, etcd02, https://10.107.88.16:2380, https://10.107.88.16:2379, false
e8107c88a85d3042, started, etcd01, https://10.107.88.15:2380, https://10.107.88.15:2379, false
### member list (table)

[root]# etcdctl member list -w table
### move leader

[root]# etcdctl endpoint --cluster=true status -w table
[root]# etcdctl --endpoints 10.107.88.16:2379 move-leader 4fc005cb2e00078b
### View all keys in a path-like format:

[root]# ETCDCTL_API=3 etcdctl get / --prefix --keys-only
### View specific key:

[root]# etcdctl get /registry/clusterrolebindings/calico-kube-controllers

3. Dealing with Data Inconsistencies

After running the above command, I noticed a difference in the DB size. All three nodes are in good condition, and the RAFT INDEX counts are consistent, but the difference in DB size indicates that data is out of sync between the three members.At the same time, I noticed that some Pods that were previously created have disappeared, I verified this using the following command:

# kubectl get pod -n test

Processing Steps:

  1. Backup the data directory of the healthy ETCD nodes.
  2. Clear the data within the member/ and wal/ directories of the abnormal ETCD node.
  3. Delete the abnormal ETCD member from the healthy ETCD node.
  4. Rejoin the abnormal node to the cluster.
  5. Start the ETCD service.
### Backup the data directory

[master]# etcdctl --endpoints=https://192.168.12.15:2379 snapshot save /backup/etcd01-snapshot.db
[master]# etcdctl --endpoints=https://192.168.12.16:2379 snapshot save /backup/etcd02-snapshot.db (abnormal)
[master]# etcdctl --endpoints=https://192.168.12.17:2379 snapshot save /backup/etcd03-snapshot.db (leader)
### Clear the data within the member/ and wal/ (etcdh02)

[etcd02]# mv /var/lib/etcd /var/lib/etcd.bak
[etcd02]# tar -czvf etcd-20230308bak.taz.gz /var/lib/etcd
### Delete the abnormal ETCD member from the healthy ETCD node

[etcd03]# etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/apiserver-etcd-client.crt \
--key=/etc/kubernetes/pki/apiserver-etcd-client.key \
--endpoints=10.107.88.17:2379 member list

[etcd03]# etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/apiserver-etcd-client.crt \
--key=/etc/kubernetes/pki/apiserver-etcd-client.key \
--endpoints=10.107.88.17:2379 member remove db6cb663afc3dec9
# Rejoin the abnormal node to the cluster

[etcd02]# mv /etc/kubernetes/manifests/etcd.yaml /etc/kubernetes/manifests-backup/etcd.yaml
[etcd02]# vim /etc/kubernetes/manifests-backup/etcd.yaml
- --initial-cluster=etcd03.test.example.poc=https://10.107.88.17:2380,etcd02.test.example.poc=https://10.107.88.16:2380,etcd01.test.example.poc=https://10.107.88.15:2380
- --initial-cluster-state=existing

[etcd02]# etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/apiserver-etcd-client.crt \
--key=/etc/kubernetes/pki/apiserver-etcd-client.key \
--endpoints=10.107.88.17:2379 member add etcd02.test.example.poc --peer-urls=https://10.107.88.16:2380

Member db6cb663afc3dec9 added to cluster 71d4ff56f6fd6e70
### Start the ETCD service (etcd02)

[etcd02]# mv /etc/kubernetes/manifests-backup/etcd.yaml /etc/kubernetes/manifests/etcd.yaml
### Verify

[master]# for count in {15..17};do ETCDCTL_API=3 etcdctl get \
--prefix --keys-only --endpoints=https://10.107.88.${count}:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/apiserver-etcd-client.crt \
--key=/etc/kubernetes/pki/apiserver-etcd-client.key \
| wc -l;done

[master]# for count in {15..17};do ETCDCTL_API=3 etcdctl endpoint status --endpoints=https://10.107.88.${count}:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/apiserver-etcd-client.crt \
--key=/etc/kubernetes/pki/apiserver-etcd-client.key;\
done

This article covers the essential operations of ETCD. In practice, the most common actions are checking its status and performing backup and restore operations.

While direct ETCD operations might not be frequent, it’s crucial to understand ETCD because it plays a critical role. In case of issues with Kubernetes, knowing how to restore the cluster’s state through ETCD backups can be a lifesaver.

In the future, I’ll share more practical insights and guides, including how to recover the Control Plane. Stay tuned!

--

--

Albert Weng

You don't have to be great to start, but you have to start to be great