HPA & VPA Deep Dive¶

🎯 Learning Objectives

Master Horizontal Pod Autoscaler (HPA)
Understand Vertical Pod Autoscaler (VPA)
Learn advanced autoscaling patterns
Troubleshoot autoscaling issues
Optimize autoscaling configurations

Autoscaling enables dynamic resource adjustment based on demand. Understanding HPA and VPA is essential for cost optimization and performance.

Autoscaling Benefits

Autoscaling optimizes resource usage, reduces costs, and maintains performance under varying load.

Scaling Limits

Set appropriate min/max replicas to prevent excessive scaling or resource exhaustion.

Horizontal Pod Autoscaler (HPA)¶

Basic HPA¶

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

HPA Metrics

HPA can scale based on: CPU, memory, custom metrics, external metrics, object metrics.

Custom Metrics¶

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: custom-metrics-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Pods
    pods:
      metric:
        name: requests_per_second
      target:
        type: AverageValue
        averageValue: "100"

Custom Metrics

Custom metrics enable scaling based on application-specific metrics (requests, queue depth, etc.).

Vertical Pod Autoscaler (VPA)¶

VPA Configuration¶

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: web
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2
        memory: 2Gi

VPA Modes

Off: Only provides recommendations
Initial: Sets resources at pod creation
Auto: Updates resources dynamically (requires recreating pods)
Recreate: Recreates pods to apply changes

Troubleshooting¶

HPA Not Scaling¶

# Check HPA status
kubectl get hpa

# Describe HPA
kubectl describe hpa <hpa-name>

# Check metrics
kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/<ns>/pods

# Check HPA controller logs
kubectl logs -n kube-system <hpa-controller-pod>

Troubleshooting Steps

Verify metrics are available
Check HPA configuration
Verify target resource exists
Review HPA controller logs
Check resource requests/limits

VPA Issues¶

# Check VPA status
kubectl get vpa

# Check VPA recommendations
kubectl describe vpa <vpa-name>

# Check VPA recommender logs
kubectl logs -n kube-system <vpa-recommender-pod>

VPA Recommendations

VPA needs time to collect metrics before providing recommendations. Monitor for several hours.

Best Practices¶

Production Recommendations

Set appropriate min/max replicas
Use multiple metrics for HPA
Test scaling behavior under load
Monitor autoscaling events
Use VPA for right-sizing recommendations
Document autoscaling policies

Next Chapter: Advanced Security Hardening