Skip to content

HPA & VPA Deep Dive

🎯 Learning Objectives

  • Master Horizontal Pod Autoscaler (HPA)
  • Understand Vertical Pod Autoscaler (VPA)
  • Learn advanced autoscaling patterns
  • Troubleshoot autoscaling issues
  • Optimize autoscaling configurations

Autoscaling enables dynamic resource adjustment based on demand. Understanding HPA and VPA is essential for cost optimization and performance.

Autoscaling Benefits

Autoscaling optimizes resource usage, reduces costs, and maintains performance under varying load.

Scaling Limits

Set appropriate min/max replicas to prevent excessive scaling or resource exhaustion.

Horizontal Pod Autoscaler (HPA)

Basic HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

HPA Metrics

HPA can scale based on: CPU, memory, custom metrics, external metrics, object metrics.

Custom Metrics

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: custom-metrics-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Pods
    pods:
      metric:
        name: requests_per_second
      target:
        type: AverageValue
        averageValue: "100"

Custom Metrics

Custom metrics enable scaling based on application-specific metrics (requests, queue depth, etc.).

Vertical Pod Autoscaler (VPA)

VPA Configuration

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: web
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2
        memory: 2Gi

VPA Modes

  • Off: Only provides recommendations
  • Initial: Sets resources at pod creation
  • Auto: Updates resources dynamically (requires recreating pods)
  • Recreate: Recreates pods to apply changes

Troubleshooting

HPA Not Scaling

# Check HPA status
kubectl get hpa

# Describe HPA
kubectl describe hpa <hpa-name>

# Check metrics
kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/<ns>/pods

# Check HPA controller logs
kubectl logs -n kube-system <hpa-controller-pod>

Troubleshooting Steps

  1. Verify metrics are available
  2. Check HPA configuration
  3. Verify target resource exists
  4. Review HPA controller logs
  5. Check resource requests/limits

VPA Issues

# Check VPA status
kubectl get vpa

# Check VPA recommendations
kubectl describe vpa <vpa-name>

# Check VPA recommender logs
kubectl logs -n kube-system <vpa-recommender-pod>

VPA Recommendations

VPA needs time to collect metrics before providing recommendations. Monitor for several hours.

Best Practices

Production Recommendations

  1. Set appropriate min/max replicas
  2. Use multiple metrics for HPA
  3. Test scaling behavior under load
  4. Monitor autoscaling events
  5. Use VPA for right-sizing recommendations
  6. Document autoscaling policies

Next Chapter: Advanced Security Hardening