This tutorial aims to impart knowledge on how to scale applications with Kubernetes. By the end of this tutorial, you will be able to adjust the number of Pod replicas and use Kubernetes Services for load balancing, thus, effectively managing the load on your applications.
Kubernetes supports two types of scaling: horizontal and vertical. Horizontal scaling means increasing or decreasing the number of Pods, while vertical scaling means increasing the CPU or memory capacity of existing Pods.
Horizontal Pod Autoscaler (HPA) automatically scales the number of Pods in a replication controller, deployment, replica set or stateful set based on observed CPU utilization.
To create a Horizontal Pod Autoscaler, you can use the kubectl autoscale
command:
kubectl autoscale deployment <deployment-name> --min=2 --max=5 --cpu-percent=80
This command creates an autoscaler that maintains between 2 to 5 replicas and tries to maintain 80% CPU utilization.
Kubernetes Services abstract the way to access your Pods. It provides a single IP address and distributes network traffic to all Pods matching the Service’s Selector.
To create a LoadBalancer Service:
kubectl expose deployment <deployment-name> --type=LoadBalancer --name=<service-name>
# Create a Deployment
kubectl run my-app --image=nginx --requests=cpu=200m --expose --port=80
# Create a Horizontal Pod Autoscaler
kubectl autoscale deployment my-app --min=2 --max=5 --cpu-percent=80
# Expose the Deployment as a Service
kubectl expose deployment my-app --type=LoadBalancer --name=my-service
Create a Deployment with an image of your choice and expose it as a Service of type LoadBalancer.
Create a Horizontal Pod Autoscaler for the Deployment you created in Exercise 1. Set the minimum number of Pods to 3 and the maximum to 10. Set the CPU utilization to 50%.
Check the status of the Horizontal Pod Autoscaler you created in Exercise 2. Try to generate some load on your application and observe how Kubernetes automatically scales the number of Pods.