Local volume vs HostPath(en)
This article will explain the differences and some considerations regarding the two types of local volume and hostPath.
Let’s get started!
1. HostPath
hostPath allows you mapping files, directories, sockets, and block devices from the node to a Pod.
Here is an example of hostPath usage:
apiVersion: v1
kind: Pod
metadata:
name: hostpath-example
spec:
containers:
- name: hostpath-container
image: nginx
volumeMounts:
- name: hostpath-volume
mountPath: /var/www/html
volumes:
- name: hostpath-volume
hostPath:
path: /var/data
type: Directory
We use hostPath to mount the /var/data
directory from the host into the /var/www/html
path within the Pod using a Directory type volume.
When using hostPath volumes:
- It is recommended to configure hostPath volumes with NodeSelector, but this can become cumbersome when dealing with a large number of Pods or Nodes.
- If you apply
DirectoryOrCreate
orFileOrCreate
, make sure that kubelet has the necessary permissions to create files or directories on the node. - If files or directories on the node are created by the root and then mounted into a container, it’s essential to ensure that the container has the appropriate permissions for reading and writing to those files or directories. You may need to adjust permissions within the container to allow the desired access.
- Kubernetes scheduler does not consider the size of hostPath volumes, and there is no built-in way to set size limits for hostPath volumes.
2. Local-volume
Local volume operates by mounting local storage resources such as disks, partitions, or folders, allowing Kubernetes to access them as static Persistent Volumes (PVs).
The main purpose is to address and resolve the issues associated with HostPath.
Local volumes require additional logic and handling by the PV controller and scheduler to ensure that when a Pod needs to be rescheduled, it can be placed on the same node where the local volume resides.
One of the key benefits of using local volumes is the assurance that Pods and PVs will always be scheduled to the same worker node, which can be crucial for certain applications with specific data locality requirements.
3. Appropriate scenarios
- Preloading Data from Remote Storage: When it is necessary to pre-load data from remote storage into a local directory to accelerate the speed of data access by Pods (caching). In this case, the access is read-only, so data integrity is not a concern. Some AI training processes also adopt this approach.
- Local Software-Defined Storage (SDS) Solutions: When a local SDS solution is in place, which inherently provides data replicas, Local volumes can be employed effectively. (ex. ceph, portworx)
- Not Suitable for Scalable Environments: Local volumes are not well-suited for environments where scaling and dynamic resource allocation are essential.
4. Considering for Using Local Volumes
- When defining Persistent Volumes (PVs), you can use the
.spec.nodeAffinity
to specify the binding relationship between a local volume and a node. - If the “local-storage storageClass” is utilized, you can use the
volumeBindingMode: WaitForFirstConsumer
parameter to implement delayed binding. This ensures that the PV controller does not immediately bind the PV to a PVC. Instead, it waits until a Pod that requires this local PV has completed scheduling before performing the binding.