Adding and Setting Up a New Node¶
All cluster nodes are k3s server nodes (etcd members). There are no agent-only nodes currently.
Prerequisites¶
- Node is on the same L2 network as existing nodes
- SSH access as root
- Node IP is known and reachable from the existing nodes
Step 1 — Join the Cluster¶
Run on the new node, substituting the first node's IP and the token from /var/lib/rancher/k3s/server/token:
curl -sfL https://get.k3s.io | sh -s - server \
--server https://<first-node-ip>:6443 \
--token "<token>" \
--flannel-backend=none \
--disable-network-policy \
--disable=servicelb \
--disable-kube-proxy \
--node-ip=<this-node-ip>
From any existing node, verify the new node appears:
It will show NotReady briefly — Calico needs a moment to apply to the new node.
Step 2 — Verify Calico and WireGuard¶
Wait for the Calico node pod to be Running:
Then confirm the node has received a WireGuard key:
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.annotations.projectcalico\.org/WireguardPublicKey}{"\n"}{end}'
Every node in the list must show a key. A blank entry means WireGuard has not activated on that node yet — wait and recheck.
Step 3 — Add TLS SAN (optional but recommended)¶
The API server certificates are generated at bootstrap with --tls-san flags for each node IP. Adding a new node IP avoids TLS errors if the API is ever contacted via that node directly.
On every existing server node, add the new IP to /etc/rancher/k3s/config.yaml:
Then restart k3s on each node one at a time (allow it to become Ready before moving to the next):
Step 4 — Verify Node is Healthy¶
# Node is Ready
kubectl get node <new-node-name>
# No stuck pods scheduled to this node
kubectl get pods -A --field-selector spec.nodeName=<new-node-name> | grep -Ev 'Running|Completed'
# Calico BGP peer established
kubectl exec -n calico-system ds/calico-node -- calico-node -show-status
Maintenance: Draining and Uncordoning¶
Before rebooting or performing maintenance on a node:
# Evict all pods, mark unschedulable
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
# After maintenance is done, re-enable scheduling
kubectl uncordon <node-name>
Warning
DaemonSet pods (Calico, Traefik) are not evicted by drain — they will restart on their own after the node comes back. Pass --ignore-daemonsets to avoid the drain failing.
Removing a Node¶
etcd will automatically handle the member removal. With 4 nodes the cluster can tolerate losing 1 node and remain available; losing 2 makes the cluster read-only.