Lab: Making the Voting App Auto-Scale

Objectives

By the end of this lab, you will be able to:

Install and verify Metrics Server in your KIND cluster
Configure HPA for vote and result services with CPU targets
Generate load and observe autoscaling behavior in real-time
Understand VPA recommendations for right-sizing resource requests
Set up KEDA for event-driven worker scaling based on Redis queue length

Prerequisites

Before starting this lab, ensure you have:

Completed Module 0 and Module 1
A running KIND cluster with 2 worker nodes (standard 3-node cluster)
Example Voting App deployed with scheduling rules from Module 1
kubectl CLI configured to communicate with your cluster
Basic understanding of resource requests and limits

Setup

Follow these steps to prepare your environment for this lab.

Step 1: Verify cluster status

kubectl cluster-info
kubectl get nodes

Expected output:

Kubernetes control plane is running at https://127.0.0.1:xxxxx

NAME                       STATUS   ROLES           AGE   VERSION
voting-app-control-plane   Ready    control-plane   10d   v1.32.0
voting-app-worker          Ready    <none>          10d   v1.32.0
voting-app-worker2         Ready    <none>          10d   v1.32.0

Step 2: Verify Example Voting App is running

kubectl get pods
kubectl get deployments

You should see vote, result, worker, redis, and postgres pods running. If the Voting App is not deployed, apply the base YAMLs from Module 0:

kubectl apply -f examples/voting-app/

Step 3: Add resource requests to vote and result Deployments

HPA requires resource requests to calculate utilization. The base Voting App from Module 0 does not have resource requests set. Update the vote and result Deployments:

Create a file named vote-resources.yaml:

vote-resources.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: vote
spec:
  template:
    spec:
      containers:
      - name: vote
        image: schoolofdevops/vote:v1
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 200m
            memory: 256Mi

Create a file named result-resources.yaml:

result-resources.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: result
spec:
  template:
    spec:
      containers:
      - name: result
        image: dockersamples/examplevotingapp_result
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 200m
            memory: 256Mi

Apply the patches:

kubectl patch deployment vote --patch "$(cat vote-resources.yaml)"
kubectl patch deployment result --patch "$(cat result-resources.yaml)"

Verify resource requests are set:

kubectl describe deployment vote | grep -A 5 "Requests:"
kubectl describe deployment result | grep -A 5 "Requests:"

You should see:

    Requests:
      cpu:        100m
      memory:     128Mi
    Limits:
      cpu:        200m
      memory:     256Mi

Tasks

Task 1: Install Metrics Server

Metrics Server provides the Metrics API required by HPA. KIND does not include Metrics Server by default, so we must install it manually.

Step 1: Apply Metrics Server YAML

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Expected output:

serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created

Step 2: Patch Metrics Server for KIND

KIND uses self-signed certificates for kubelet TLS. Metrics Server will fail to connect to kubelets without a patch. Add the --kubelet-insecure-tls flag:

kubectl patch -n kube-system deployment metrics-server --type=json \
  -p '[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--kubelet-insecure-tls"}]'

This tells Metrics Server to skip TLS certificate verification when connecting to kubelets.

Step 3: Wait for Metrics Server to be ready

kubectl wait --for=condition=available --timeout=300s deployment/metrics-server -n kube-system

Step 4: Verify Metrics Server is working

kubectl top nodes

Wait 60 seconds if you see an error. Metrics Server needs time to collect initial metrics. After the collection period:

kubectl top nodes
kubectl top pods

Expected output:

NAME                       CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
voting-app-control-plane   150m         3%     800Mi           10%
voting-app-worker          100m         2%     600Mi           7%
voting-app-worker2         80m          2%     550Mi           6%

NAME                      CPU(cores)   MEMORY(bytes)
vote-xxxxxxxxxx-xxxxx     2m           50Mi
result-xxxxxxxxxx-xxxxx   3m           55Mi
worker-xxxxxxxxxx-xxxxx   5m           60Mi
redis-xxxxxxxxxx-xxxxx    4m           20Mi
postgres-xxxxxxxxxx-xxxxx 8m           100Mi

If kubectl top returns metrics, Metrics Server is working correctly.

Task 2: Configure HPA for Vote Service

Now that Metrics Server is running, create an HPA to automatically scale the vote service based on CPU utilization.

Step 1: Create HPA for vote service

Create a file named hpa-vote.yaml:

hpa-vote.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: vote-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: vote
  minReplicas: 2
  maxReplicas: 8
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50  # Scale up when CPU > 50%
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5 min before scaling down
      policies:
      - type: Percent
        value: 50  # Scale down max 50% of pods at once
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0  # Scale up immediately
      policies:
      - type: Percent
        value: 100  # Can double pods if needed
        periodSeconds: 30
      - type: Pods
        value: 2  # Add max 2 pods per 30s
        periodSeconds: 30
      selectPolicy: Max  # Use policy that adds more pods

Step 2: Apply the HPA

kubectl apply -f hpa-vote.yaml

Expected output:

horizontalpodautoscaler.autoscaling/vote-hpa created

Step 3: Verify HPA is working

kubectl get hpa

Expected output:

NAME       REFERENCE         TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
vote-hpa   Deployment/vote   15%/50%   2         8         2          30s

The TARGETS column shows current/target CPU utilization. If you see <unknown>/50%, wait 60 seconds for Metrics Server to collect data.

Step 4: Watch HPA status

kubectl get hpa vote-hpa --watch

This command watches HPA status in real-time. You should see the current CPU utilization updating every 15 seconds. Leave this running in a terminal window. We will observe scaling in the next task.

Task 3: Generate Load and Observe Scaling

Now generate load on the vote service to trigger HPA scaling.

Step 1: Get vote service endpoint

VOTE_SERVICE_IP=$(kubectl get svc vote -o jsonpath='{.spec.clusterIP}')
VOTE_SERVICE_PORT=$(kubectl get svc vote -o jsonpath='{.spec.ports[0].port}')
echo "Vote service: http://$VOTE_SERVICE_IP:$VOTE_SERVICE_PORT"

Step 2: Deploy load generator

Create a pod that continuously sends requests to the vote service:

kubectl run load-generator \
  --image=busybox:1.36 \
  --restart=Never \
  -- /bin/sh -c "while true; do wget -q -O- http://$VOTE_SERVICE_IP:$VOTE_SERVICE_PORT; done"

Step 3: Watch HPA scale up

In the terminal window running kubectl get hpa vote-hpa --watch, you should see:

NAME       REFERENCE         TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
vote-hpa   Deployment/vote   25%/50%   2         8         2          2m
vote-hpa   Deployment/vote   85%/50%   2         8         2          3m  <- CPU exceeded target
vote-hpa   Deployment/vote   85%/50%   2         8         3          3m  <- scaled up to 3
vote-hpa   Deployment/vote   65%/50%   2         8         3          4m
vote-hpa   Deployment/vote   65%/50%   2         8         4          4m  <- scaled up to 4
vote-hpa   Deployment/vote   48%/50%   2         8         4          5m  <- stabilized

The HPA watches CPU utilization every 15 seconds. When it exceeds the target (50%), HPA adds replicas to bring utilization back below the target.

Step 4: Verify replicas increased

kubectl get pods -l app=vote

You should see at least 3-4 vote pods running.

Step 5: Stop load generation and watch scale down

kubectl delete pod load-generator

Watch the HPA in the other terminal. It will NOT scale down immediately. The stabilization window (300 seconds = 5 minutes) prevents rapid scale-down. After 5 minutes, you will see:

NAME       REFERENCE         TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
vote-hpa   Deployment/vote   5%/50%    2         8         4          8m
vote-hpa   Deployment/vote   5%/50%    2         8         3          8m  <- scaled down to 3
vote-hpa   Deployment/vote   5%/50%    2         8         2          9m  <- scaled down to 2 (minReplicas)

The stabilization window prevents flapping (rapid scale up and down). This is critical for production stability.

Step 6: Functional verification

Verify the vote service still works correctly:

kubectl port-forward svc/vote 8080:80 &

Open your browser to http://localhost:8080 and submit a vote. The vote should register correctly. Stop the port-forward:

pkill -f "port-forward svc/vote"

Task 4: Create HPA for Result Service

Repeat the HPA configuration for the result service.

Step 1: Create HPA for result service

Create a file named hpa-result.yaml:

hpa-result.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: result-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: result
  minReplicas: 2
  maxReplicas: 8
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30
      - type: Pods
        value: 2
        periodSeconds: 30
      selectPolicy: Max

Step 2: Apply and verify

kubectl apply -f hpa-result.yaml
kubectl get hpa

You should see both vote-hpa and result-hpa listed.

Task 5: KEDA for Event-Driven Worker Scaling

The worker service consumes votes from the Redis queue. CPU usage is a poor indicator of actual work. If the queue has 10,000 votes but the worker is idle waiting for Redis, HPA will not scale up. KEDA solves this by scaling based on queue depth.

Step 1: Install KEDA

kubectl apply -f https://github.com/kedacore/keda/releases/download/v2.14.0/keda-2.14.0.yaml

Expected output:

namespace/keda created
customresourcedefinition.apiextensions.k8s.io/clustertriggerauthentications.keda.sh created
customresourcedefinition.apiextensions.k8s.io/scaledjobs.keda.sh created
customresourcedefinition.apiextensions.k8s.io/scaledobjects.keda.sh created
customresourcedefinition.apiextensions.k8s.io/triggerauthentications.keda.sh created
serviceaccount/keda-operator created
clusterrole.rbac.authorization.k8s.io/keda-operator created
clusterrolebinding.rbac.authorization.k8s.io/keda-operator created
role.rbac.authorization.k8s.io/keda-operator created
rolebinding.rbac.authorization.k8s.io/keda-operator created
service/keda-operator created
service/keda-operator-metrics-apiserver created
deployment.apps/keda-operator created
deployment.apps/keda-operator-metrics-apiserver created

Step 2: Wait for KEDA operator to be ready

kubectl wait --for=condition=available --timeout=300s deployment/keda-operator -n keda
kubectl wait --for=condition=available --timeout=300s deployment/keda-operator-metrics-apiserver -n keda

Step 3: Create ScaledObject for worker

Create a file named keda-worker.yaml:

keda-worker.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: worker-scaledobject
spec:
  scaleTargetRef:
    name: worker  # Target the worker Deployment
  minReplicaCount: 1
  maxReplicaCount: 5
  triggers:
  - type: redis
    metadata:
      address: redis:6379
      listName: votes
      listLength: "5"  # Scale when queue > 5 messages

Apply the ScaledObject:

kubectl apply -f keda-worker.yaml

Step 4: Verify KEDA ScaledObject

kubectl get scaledobject

Expected output:

NAME                  SCALETARGETKIND      SCALETARGETNAME   MIN   MAX   TRIGGERS   AUTHENTICATION   READY   ACTIVE   FALLBACK   AGE
worker-scaledobject   apps/v1.Deployment   worker            1     5     redis                       True    False    False      30s

Step 5: Verify KEDA created an HPA

kubectl get hpa

You should see an HPA named keda-hpa-worker-scaledobject automatically created by KEDA:

NAME                            REFERENCE            TARGETS     MINPODS   MAXPODS   REPLICAS   AGE
vote-hpa                        Deployment/vote      5%/50%      2         8         2          15m
result-hpa                      Deployment/result    3%/50%      2         8         2          10m
keda-hpa-worker-scaledobject    Deployment/worker    0/5 (avg)   1         5         1          2m

The TARGETS column shows current queue length vs. target (5 messages). KEDA translates Redis list length into an HPA metric.

Step 6: Demonstrate worker scaling

Submit many votes rapidly to fill the Redis queue:

kubectl port-forward svc/vote 8080:80 &

In another terminal, use curl to submit votes in a loop:

for i in {1..50}; do
  curl -s -X POST http://localhost:8080 -d "vote=a" > /dev/null
  echo "Submitted vote $i"
done

Watch the worker scale up as the queue grows:

kubectl get hpa keda-hpa-worker-scaledobject --watch

You should see the TARGETS increase as votes accumulate in Redis, and the worker replicas increase accordingly:

NAME                            REFERENCE            TARGETS      MINPODS   MAXPODS   REPLICAS   AGE
keda-hpa-worker-scaledobject    Deployment/worker    12/5 (avg)   1         5         1          5m
keda-hpa-worker-scaledobject    Deployment/worker    12/5 (avg)   1         5         3          5m  <- scaled up
keda-hpa-worker-scaledobject    Deployment/worker    8/5 (avg)    1         5         3          6m
keda-hpa-worker-scaledobject    Deployment/worker    2/5 (avg)    1         5         3          7m
keda-hpa-worker-scaledobject    Deployment/worker    0/5 (avg)    1         5         1          10m <- scaled down

Stop the port-forward:

pkill -f "port-forward svc/vote"

KEDA scales the worker based on actual work in the queue, not CPU usage. This is more efficient for queue consumers.

Challenge: Understanding the HPA/VPA Conflict

This challenge is conceptual, demonstrating why HPA and VPA should not target the same Deployment for the same metric.

Scenario:

Imagine you apply both HPA (targeting 70% CPU utilization) and VPA (in Auto mode) to the vote Deployment.

Vote pod uses 80m CPU with a request of 100m CPU. Utilization: 80%.
HPA sees 80% > 70% target and adds replicas.
VPA observes consistent 80m usage and reduces request to 80m CPU (optimizing resources).
Now the same 80m usage becomes 80m / 80m = 100% utilization.
HPA sees 100% > 70% target and adds more replicas.
The load distributes, and usage per pod drops to 50m.
VPA sees lower usage and reduces requests to 50m.
HPA sees 50m / 50m = 100% utilization again and scales up.
This cycle repeats, causing thrashing.

Key learning:

Never run HPA and VPA together on the same Deployment for the same resource. The solution:

Use VPA in Off mode (recommendation only)
Review VPA recommendations: kubectl describe vpa worker-vpa
Apply recommended resource requests manually to your Deployment
Then create HPA for horizontal scaling based on the new requests

This gives you right-sized pods with dynamic replica counts.

Verification

Confirm your lab setup is working correctly:

1. Metrics Server is running and returning metrics

kubectl top nodes
kubectl top pods

Expected: Both commands return resource usage data without errors.

2. HPA resources exist and show valid targets

kubectl get hpa

Expected output:

NAME                            REFERENCE            TARGETS     MINPODS   MAXPODS   REPLICAS
vote-hpa                        Deployment/vote      5%/50%      2         8         2
result-hpa                      Deployment/result    3%/50%      2         8         2
keda-hpa-worker-scaledobject    Deployment/worker    0/5 (avg)   1         5         1

TARGETS should show current/target values, not <unknown>.

3. Vote service scaled up under load

During Task 3, you observed vote replicas increase from 2 to at least 3-4 when load was applied.

4. Vote service scaled down after load removed

After deleting the load generator, you observed vote replicas return to 2 (minReplicas) after the 5-minute stabilization window.

5. KEDA ScaledObject exists and manages worker scaling

kubectl get scaledobject
kubectl get hpa keda-hpa-worker-scaledobject

Expected: ScaledObject is READY and ACTIVE. HPA was created automatically by KEDA.

6. Voting App still fully functional end-to-end

kubectl port-forward svc/vote 8080:80 &
kubectl port-forward svc/result 8081:80 &

Open browser to http://localhost:8080, submit a vote. Open http://localhost:8081, verify the vote appears in results. Stop port-forwards:

pkill -f "port-forward"

Cleanup

Module 2 completes the 0-1-2 build-up sequence. Modules 3 and 4 start with focused, clean deployments rather than carrying forward autoscaling state. Clean up autoscaling resources but keep the base cluster.

Step 1: Delete HPA resources

kubectl delete hpa vote-hpa
kubectl delete hpa result-hpa

Step 2: Delete KEDA ScaledObject

kubectl delete scaledobject worker-scaledobject

KEDA will automatically delete the HPA it created.

Step 3: Verify cleanup

kubectl get hpa
kubectl get scaledobject

Expected: No resources listed.

Step 4: Keep KIND cluster and base Voting App running

Do not delete the cluster or Voting App. Learners will redeploy fresh from examples/ as needed in subsequent modules.

Troubleshooting

Issue: HPA shows `<unknown>` for TARGETS

Symptom: kubectl get hpa shows <unknown>/50% in the TARGETS column. Pods do not scale.

Cause: Metrics Server is not running, not ready, or the target Deployment has no resource requests.

Solution:

# Check Metrics Server is running
kubectl get deployment -n kube-system metrics-server

# Check Metrics Server logs
kubectl logs -n kube-system deployment/metrics-server

# Verify Metrics API is available
kubectl top nodes

# Verify target Deployment has resource requests
kubectl describe deployment vote | grep -A 5 "Requests:"

If Metrics Server is not running, repeat Task 1. If resource requests are missing, repeat Setup Step 3.

Issue: Pods do not scale up under load

Symptom: Load generator is running, but vote replicas remain at 2. HPA shows low CPU utilization.

Cause: CPU target utilization is too high, or load generator is not generating enough load.

Solution:

# Check current CPU utilization
kubectl top pods -l app=vote

# Check HPA events
kubectl describe hpa vote-hpa

# Verify load generator is running
kubectl get pod load-generator
kubectl logs load-generator

If CPU usage is below the target (50%), increase load by deploying multiple load generator pods:

kubectl run load-generator-2 \
  --image=busybox:1.36 \
  --restart=Never \
  -- /bin/sh -c "while true; do wget -q -O- http://$VOTE_SERVICE_IP:$VOTE_SERVICE_PORT; done"

Or lower the HPA target to 30% to trigger scaling at lower CPU usage.

Issue: KEDA ScaledObject not working

Symptom: kubectl get scaledobject shows READY: False or ACTIVE: False. Worker does not scale based on Redis queue length.

Cause: KEDA operator is not running, Redis connection issue, or incorrect ScaledObject configuration.

Solution:

# Check KEDA operator is running
kubectl get deployment -n keda

# Check KEDA operator logs
kubectl logs -n keda deployment/keda-operator

# Verify Redis service is reachable
kubectl run test-redis --image=redis:alpine --rm -it --restart=Never -- redis-cli -h redis PING

# Check ScaledObject status
kubectl describe scaledobject worker-scaledobject

If KEDA operator is not running, repeat Task 5 Step 1. If Redis is not reachable, verify the Redis service is running: kubectl get svc redis.

Issue: Pods scale down too quickly

Symptom: HPA removes replicas immediately when load drops, then has to scale up again when load returns. This causes instability.

Cause: Stabilization window is too short or not configured.

Solution:

Update the HPA behavior section to increase the scaleDown stabilizationWindowSeconds:

behavior:
  scaleDown:
    stabilizationWindowSeconds: 600  # Wait 10 minutes before scaling down

Apply the updated HPA:

kubectl apply -f hpa-vote.yaml

This prevents rapid scale-down during temporary traffic dips.

Key Takeaways

Metrics Server provides the Metrics API required by HPA and must be installed separately in KIND clusters
HPA calculates utilization as (current usage / resource request), so resource requests are mandatory for HPA to function
Stabilization windows prevent flapping by scaling up fast and scaling down slowly
Never run HPA and VPA together on the same Deployment for the same resource to avoid thrashing
KEDA extends HPA with event-driven scaling for queue consumers and scheduled workloads, scaling based on actual work rather than CPU usage

Objectives​

Prerequisites​

Setup​

Tasks​

Task 1: Install Metrics Server​

Task 2: Configure HPA for Vote Service​

Task 3: Generate Load and Observe Scaling​

Task 4: Create HPA for Result Service​

Task 5: KEDA for Event-Driven Worker Scaling​

Challenge: Understanding the HPA/VPA Conflict​

Verification​

Cleanup​

Troubleshooting​

Issue: HPA shows <unknown> for TARGETS​

Issue: Pods do not scale up under load​

Issue: KEDA ScaledObject not working​

Issue: Pods scale down too quickly​

Key Takeaways​

Objectives

Prerequisites

Setup

Tasks

Task 1: Install Metrics Server

Task 2: Configure HPA for Vote Service

Task 3: Generate Load and Observe Scaling

Task 4: Create HPA for Result Service

Task 5: KEDA for Event-Driven Worker Scaling

Challenge: Understanding the HPA/VPA Conflict

Verification

Cleanup

Troubleshooting

Issue: HPA shows `<unknown>` for TARGETS

Issue: Pods do not scale up under load

Issue: KEDA ScaledObject not working

Issue: Pods scale down too quickly

Key Takeaways