Horizontal Pod Autoscaler (HPA): Scaling Workloads in Kubernetes

Kubernetes provides a built-in Horizontal Pod Autoscaler (HPA) to manage workload scaling dynamically. HPA adjusts the number of pods based on CPU, memory, or custom metrics, optimizing performance and cost.
How HPA Works
Monitors resource utilization via the Kubernetes Metrics Server.
Adjusts pod counts based on predefined thresholds.
Continuously evaluates demand and scales accordingly.
Setting Up HPA
Install Metrics Server
HPA requires the Metrics Server:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Deploy a Sample Application
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 2
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app-container
image: alpine
command: ["/bin/sh"]
args: ["-c", "apk add --no-cache stress-ng && stress-ng --cpu 1 --timeout 600s"]
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
Configure HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
Apply the configuration:
kubectl apply -f my-app-hpa.yaml
Monitoring HPA
Watch HPA activity:
kubectl get hpa --watch
Monitor pod scaling:
kubectl get pods -w