To successfully manage your infrastructure, you need to put in place a monitoring mechanism. Monitoring forms an important part of infrastructure reliability. Monitoring enables you to respond to incidents, identify and debug errors, assure good performance, support you in planning and provide an overview of your infrastructure. Implementing monitoring in Kubernetes is quite different from implementing monitoring of virtual machines (VMs) and physical machines. Therefore, it needs a different mind set. Some of the ways in which Kubernetes monitoring differs from monitoring VMs or physical machines are discussed below.
• In Kubernetes, use of tags and labels become more important. Use of labels have become prominent because it is the only way you can provide an identity to pods and associated containers. The two types of labels are user defined and system exposed. Labels should be defined so that they provide infrastructure information such as, whether it is front end/back end, the version, environment and the type of application. It is these user defined labels that enable you to navigate across your infrastructure metrics and events. System exposed labels provide you information about pods, nodes and containers among others.
• In Kubernetes, you need to monitor a larger number of components. Traditional monitoring only required you to monitor applications and their hosts. The components that need to be monitored in Kubernetes are hosts, containers, applications running within containers and Kubernetes.
• In Kubernetes, your application needs to evolve to handle the challenge of monitoring applications that are moving. Control over where applications are running is very limited, which necessitates the need for a service discovery to be incorporated into monitoring.
• In Kubernetes, applications can be hosted on cloud infrastructure provided by different vendors. This raises the complexity of metric collection and aggregation.
In the following section, we will discuss the different metrics that need to be monitored
Assuring proper Kubernetes performance requires checking pod deployments health. It is important you monitor the number of desired, available and unavailable pods. The key state to watch out for here is that the numbers of available and desired pods match. Another important metric is the number of running pods. The number of running pods will provide you with a picture of infrastructure evolution.
The metrics available for monitoring resource use are; CPU usage, node CPU capacity, memory usage, node memory capacity, requests, limits, file system usage and disk I/O. When you are experiencing performance problems you will first look at resource usage metrics. Traditional monitoring was concerned with checking resource use and node capacity. When node resources are used up, Kubernetes responds by scheduling a new pod but this will break when containers lack resources to serve requests. Therefore, monitoring should focus on checking that total requests are less than node capacity. When available resources cannot meet demands, you need to consider increasing node capacity or number of nodes. You need to monitor disk use percent over time and when a point of concern is exceeded raise an alert.
Another metric category of importance is the network. Important network metrics are network in, network out and network errors. Network metrics provide you with a picture of network load on your infrastructure.
To support you in metric collection, some of the tools available are Heapster, Prometheus, and Kubernetes dashboard. This article will be limited to discussing those three tools. To help in demonstrating how the monitoring tools are used, let us create a container cluster on Google cloud. Login to your console and click on container engine then create a container cluster. Provide a cluster name, zone, machine type, and cluster size.
Before we can connect to our cluster, we need to configure kubectl and start a proxy using the commands below.
gcloud container clusters get-credentials metric-collection \
--zone us-central1-a --project vagrant-set-up
gcloud auth application-default login
Navigating to this URL http://localhost:8001/ui will lead you to the Kubernetes dashboard from where you can manage your cluster. After deploying your applications, metrics will be available in the pods tab.
Let us deploy a sample application. Save the contents of the yaml file here https://github.com/GoogleCloudPlatform/deploymentmanager-samples/blob/master/examples/v2/container_vm/jinja/container_vm.yaml locally and deploy it using the Kubernetes dashboard.
To collect metrics, Prometheus is deployed as a pod in your cluster. To deploy Prometheus, you need the Prometheus pod deployment and a ConfiMap that specifies how Prometheus connection to the cluster will happen. The pod deployment and the ConfigMap are contained in a manifest file, which can be deployed in a Kubernetes cluster using the command below
kubectl create -f https://raw.githubusercontent.com/coreos/blog-examples/master/monitoring-kubernetes-with-prometheus/prometheus.yml
If you are using the Kubernetes dashboard to manage your cluster, you can save the manifest locally and deploy it.
To use Heapster, you need to deploy it as a pod on each node from where it will collect metrics on pods and nodes and store them in your preferred backend. Once deployed, Heapster will gather metrics on each node by querying the Kubelet. Heapster metadata includes labels which enable you to customize reports on nodes. When you are using Google Compute Engine (GCE), it is very easy to set up Heapster because there is a dashboard you can hook onto. When your cluster is not hosted on GCE or you are not interested in using the dashboard provided by GCE, you can set up Heapster using InfluxDB and Grafana.
In Kubernetes clusters, InfluxDB is the most commonly used Heapster backend by default. InfluxDB was developed with a specific objective of handling time series observations and provide high performance and availability. To visualize collected metrics, you use Grafana dashboard which is already available in a Kubernetes cluster.
The commands below are used to set up InfluxDB and Grafana and request information about the cluster
git clone https://github.com/kubernetes/heapster.git
kubectl create -f deploy/kube-config/influxdb/
In this post, we introduced the importance of monitoring in ensuring your infrastructure performs as you need it to. We discussed how monitoring a Kubernetes environment differs from monitoring a VM or physical environment. We discussed the different metrics that need to be collected. Finally, we discussed how to set up and use Prometheus, Heapster and Kubernetes dashboard for metric collection and visualization.