Horizontal Pod Autoscaler (HPA): Scaling Workloads in Kubernetes

Horizontal Pod Autoscaler (HPA): Scaling Workloads in Kubernetes

Kubernetes provides a built-in Horizontal Pod Autoscaler (HPA) to manage workload scaling dynamically. HPA adjusts the number of pods based on CPU, memory, or custom metrics, optimizing performance and cost.

How HPA Works

Monitors resource utilization via the Kubernetes Metrics Server.
Adjusts pod counts based on predefined thresholds.
Continuously evaluates demand and scales accordingly.

Setting Up HPA

Install Metrics Server

HPA requires the Metrics Server:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Deploy a Sample Application

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app-container
        image: alpine
        command: ["/bin/sh"]
        args: ["-c", "apk add --no-cache stress-ng && stress-ng --cpu 1 --timeout 600s"]
        resources:
          requests:
            cpu: "200m"
            memory: "256Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"

Configure HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
Apply the configuration:
kubectl apply -f my-app-hpa.yaml

Monitoring HPA
Watch HPA activity:
kubectl get hpa --watch

Monitor pod scaling:
kubectl get pods -w