Understanding the Basics of Internal Networking in Kubernetes(En)

Albert Weng
9 min readFeb 5, 2024

--

Recently, I’ve been diving into some questions about K8S networking within our internal systems. It took me some time to understand how various resources communicate within a Kubernetes cluster. After thoroughly reading articles from experts online, I’ve compiled my findings, added some personal insights, and created a reference document for myself and anyone else dealing with similar issues.

Given that this article leans more towards explanation and theory, and the content might be a bit extensive, I appreciate your patience. Here are the key points I’ll be covering in this piece:

  1. Basic Concepts of K8S Networking
  2. Container-to-Container
  3. Pod-to-Pod
  4. Pod-to-Service
  5. External-to-Service
  6. Service Discovery
  7. Types of Services
  8. Conclusion

1. Basic Concepts of K8S Networking

Kubernetes networking is designed to facilitate communication among entities running within the K8S environment. Kubernetes achieves this through a layered design, distinguishing entities at different levels (Namespace/Pod/Container), making communication crucial within the K8S ecosystem.

※ Essentially, Kubernetes networking is designed to address the following requirements:

  1. Enable communication between Pods without using NAT.
  2. 2. Facilitate communication between Pods on different nodes without using NAT.
  3. 3. Ensure that a Pod perceives its IP to be the same as other Pods perceive it.

To meet these three requirements and align with the layered design philosophy, various scenarios have been developed, which will be discussed in the following sections.

2. Container-to-Container

The network between containers is generated within the Pod network namespace.

Each Pod has its own Network namespace and container. Within the same Pod, containers share the same IP and port. The green lines in the diagram below illustrate the communication between two containers within the same Pod.

In Linux, we can use the ‘ip’ command to isolate the network namespace of a process. Within this logical namespace, the process runs with its own set of routes, firewall rules, and network interfaces. Therefore, a network namespace essentially creates a completely new network stack for all processes within the same namespace.

[root]# ip netns add ns1
[root]# ls /var/run/netns
ns1
[root]# ip netns
ns1

In fact, Linux itself establishes a root network namespace and assigns all processes to this netns by default, thereby accepting all external traffic.

3. Pod-to-Pod

Within a K8S cluster, each node has a designated IP range allocated for Pod use. This ensures that each Pod is assigned a unique IP address (real IP). When a new Pod is created, it won’t be assigned an IP address that is already in use. Unlike container-to-container networking, where communication is established internally, Pod-to-Pod communication uses IP addresses, whether the Pods are deployed on the same node or different nodes.

The diagram above illustrates how Pods communicate with each other. This is achieved through virtual Ethernet devices or veth pairs (veth0/veth1), which facilitate the movement of traffic between the Pod network namespace and the root network namespace. A Virtual Bridge (L2) connects these virtual interfaces, allowing traffic to move via ARP (Address Resolution Protocol).

veth0 & veth1 facilitate the veth pair.

The Virtual Bridge is responsible for inspecting the destination of the transmitted packets to determine whether to forward them to connected segments. It also maintains a forwarding table, deciding whether to drop packets based on MAC inspection.

Next, let’s take a look at the process of data flowing from Pod1 to Pod2 (movement within the same node), as illustrated in the diagram above:

  • Step 1: Traffic from Pod1 flows through eth0 to the virtual interface veth0 in the Root network namespace.
  • Step 2: The traffic moves through veth0 towards the virtual bridge, which is connected to veth1 (at the L2 level, using ARP protocol for discovery and broadcasting to all connected interfaces).
  • Step 3: The traffic flows through the virtual bridge to veth1 (based on the ARP table defining who should respond).
  • Step 4: The traffic reaches eth0 in Pod2.

Having understood how data flows within the same node, let’s now explore how data flows to other nodes:

  • Step1. The packet starts from Pod1 and, upon reaching the virtual bridge, as the destination is not found locally, the bridge forwards the packet to the default route (eth0). At this point, the packet is ready to leave the current node.
  • Step2. Assuming the network environment can correctly route this IP to the appropriate node.
  • Step3. The packet enters the root namespace of another node and, through the virtual bridge, connects to the correct veth1.
  • Step4. Through the veth pair, the packet is correctly transmitted to eth0 of Pod4.

4. Pod-to-Service

Pods are highly dynamic components within a Kubernetes environment. Due to dynamic scaling or the need to recreate Pods based on application service requirements, Pod IPs may change, posing challenges for service availability.

To address this issue, Kubernetes employs the concept of Services, implementing the following:

  • Services are assigned a static virtual IP at the frontend to connect to all backend Pods.
  • This virtual IP balances the traffic directed towards backend Pods.
  • The system continuously tracks the IP locations of Pods (even if Pod IPs change) because external users only connect to the frontend Service VIP.

The aforementioned load balancing is implemented in two ways:

(1) IPTABLES: Kube-proxy keeps an eye on API Server changes. When a new service appears, it adds iptables rules to catch traffic going to the Service Cluster IP and Port, directing it to backend Pods. It randomly picks a backend Pod for the flow. This method is reliable with lower system overhead.

(2) IPVS: IPVS is a load balancing solution on Netfilter and the transport layer. It uses Netfilter hook, a hash table, and operates in the kernel space. When kube-proxy uses IPVS mode, it redirects traffic with low latency and high throughput, outperforming IPTABLES.

Here’s a diagram illustrating the flow direction from Pod to Service:

  • Step1. The packet exits the Pod namespace through eth0.
  • Step2. ARP on the vbridge is unaware of the Service’s existence.
  • Step3. The vbridge only directs the packet towards the default route named eth0.
  • Step4. When the packet enters the Service (SVC), iptables uses rules installed by kube-proxy on the node to respond to Service or Pod events. It rewrites the packet’s destination from the Service IP to a specific Pod IP.
  • Step5. iptables, utilizing the conntrack utility in the Linux kernel, remembers the Pod selection made, ensuring future traffic is routed to the same Pod (preventing any scaling events). Essentially, iptables directly performs load balancing within the cluster on the Node.

Conversely, in the case of Service to Pod, when the packet enters iptables and determines the destination IP is a specific Pod, the Service (SVC) modifies the source to be the SVC’s IP, allowing the packet to continue towards the destination Pod.

5. External-to-Service

Having covered the internal flow within the cluster, let’s now explore scenarios where application services are exposed to the external network. We’ll break it down into two directions: Egress and Ingress.

(1) Egress (Internal to External): Traffic flows from the cluster to the external network. In this scenario, iptables implements source NAT, presenting the traffic as originating from the node rather than the Pod.

Here’s the process:

Step 1: The packet starts from the Pod namespace, connecting through the veth pair to the root namespace.

Step 2: The vbridge, lacking the Dst IP in its records, directs the traffic through the default route.

Step 3: The Service (SVC) performs source NAT, transforming the Pod IP into the VM IP. If the source IP remains as the Pod IP, it might be dropped at the Internet gateway since it recognizes only the VM IP.

Step 4: With source NAT completed, the packet can now be accepted by the Internet gateway.

Step 5: The Internet gateway performs another NAT, changing the VM internal IP to an external IP, and then moves the packet towards the Internet.

Step 6: On the return journey, the packet follows the same path, and all modifications to the source IP are reverted. This ensures that each layer of the system receives understandable IP addresses: at the node or VM level, within the VM internally, and at the Pod level within the Pod namespace.

(2) Ingress (External to Internal): Traffic flows from the external network into the cluster’s Service. Ingress also has the ability to allow or block specific connections through rules. There are two main methods:

  • Service Load Balancer (L4): A Cloud controller automatically creates a Load Balancer when a Service is set up. The Load Balancer is assigned an IP, enabling users to direct traffic to it, facilitating communication with the Service. (Ex. Metallb)

Step 1: Direct the packet to the Load Balancer.

Step 2: The Load Balancer randomly selects a VM node.

Step 3: On the chosen VM node, the packet is directed to the correct Pod through internal Load Balancer rules (e.g., kube-proxy).

Step 4: iptables executes the correct NAT to route the packet to the correct Pod.

  • Ingress Controller (L7): Operates within the HTTP/HTTPS protocol range of the network stack, running above Services, e.g., NodePort.

6. Service Discovery

Kubernetes discovers Services through two methods:

1. Environment Variables: Configured by the kubelet service for each running service using {SVCNAME}SERVICE_HOST and {SVENAME}SERVICE_PORT. The Service needs to be set up before Pod creation for variable injection during Pod creation.

2. DNS: Deployed as a Kubernetes service, the DNS service includes DNS server pods. All cluster Pods use this DNS service, and it covers the Pod namespace and Cluster default domain in the DNS search list.

Cluster-aware DNS, like CoreDNS, monitors K8S API for new Services, automatically updating DNS records. With DNS enabled, Pods can resolve Services using their DNS names. The Kubernetes DNS server is the only way to access ExternalName Services.

7. Types of Services

Kubernetes Service offers a method to access a group of Pods, typically implemented using label selectors. Applications may need to access other services within the same cluster or expose services to the external world. Kubernetes ServiceTypes allow you to specify how a Service should be exposed.

  • ClusterIP: Default Service Type that allows communication between Application services within the same cluster. It does not accept external connections.
  • LoadBalancer: Requires a Cloud Load Balancer provider. Traffic directly reaches the backend Pods from an external Load Balancer, and the flow is determined by the Cloud LB.
  • NodePort: Opens a specific port on all nodes, enabling external services to forward traffic to the Service via the designated port. The traffic is then directed to the backend Pods based on rules.

8. Conclusion

This article explains key concepts in Kubernetes internal networking, detailing how communication works from individual containers to interactions between internal and external networks. Understanding these concepts is crucial for effective K8S cluster management, helping troubleshoot connectivity issues and optimize load distribution among Pods. A solid grasp of these basics is essential to prevent avoidable challenges during maintenance.

Thanks for reading, and until next time, happy clustering!

References:

--

--

Albert Weng

You don't have to be great to start, but you have to start to be great