Keeping an eagle eye on Kubernetes services
In this story, I am going to show you how I set up a dashboard for monitoring Kubernetes services.
Kubernetes now runs in more than 70 percent of container environments
High availability - Regular application health checks and effective application monitoring will allow you to detect issues before they become full-fledged outages.
Optimal Resource allocation - Consider this: roughly 49 percent of containers use under 30 percent of their requested CPU allocation, and 45 percent of containers use less than 30 percent of their allotted memory. Real-time monitoring can help prevent these problems. Idle resources are expensive and don’t provide any real benefit to your ecosystem.
Troubleshooting - With so many containers being orchestrated, it's nearly impossible to keep an eye on all of them and find the failing ones.
Alerting - Finally, for business continuity, there should be an immediate incident response should one occur. Alerting is the first step towards remediation.
Dashboard at a glance
A New Relic dashboard has been set up to watch out for any anomalies in our services. It enables us to monitor the health and performance of all our applications over time. The view helps pinpoint issues in the application and locate servers where we can begin investigating.
Kubernetes specific metrics
We leverage the Kubernetes connector in New Relic to extract below metrics
- Application status - Gets the running status of containers comprising the application
- Pods status - Fetch count and status of each instance/pod for a given app
- CPU utilization - This highlights the surplus/deficit of compute capacity and helps optimize the same.
- Memory utilization - This metric gives an idea of the memory footprint of containers and help set HPA configurations
- Restart count - Fetches container restarts to flag crashing services
- Application Version - Keep a tab on app versions and latest rollouts
NRQL is New Relic’s SQL-like query language. You can use NRQL to retrieve detailed Kubernetes data and create a dashboard as shown above.
Below are some example queries for reference -
SELECT latest(status) FROM K8sContainerSample WHERE clusterName = ‘EKS cluster name’ AND containerName LIKE ‘service-%’ FACET containerName
SELECT uniqueCount(podName) FROM K8sPodSample WHERE clusterName = ‘EKS cluster name’ AND namespace IN (‘ns1’, ‘ns2’) AND status = ‘Running’ FACET deploymentName SINCE 1 MINUTE AGO
SELECT average(cpuCoresUtilization) FROM K8sContainerSample WHERE clusterName = ‘EKS cluster name’ AND containerName LIKE ‘service-%’ SINCE 1 DAY AGO FACET containerName TIMESERIES MAX
SELECT average(memoryUtilization) FROM K8sContainerSample where clusterName = ‘EKS cluster name’ AND containerName LIKE ‘service-%’ SINCE 1 DAY AGO FACET containerName TIMESERIES MAX
SELECT latest(restartCount) FROM K8sContainerSample WHERE clusterName = ‘EKS cluster name’ AND displayName LIKE ‘service-%’ FACET displayName
SELECT latest(containerImage) FROM K8sContainerSample WHERE clusterName = ‘EKS cluster name’ AND containerName LIKE ‘service-%’ FACET containerName
There are a lot of other metrics which can be fetched and visualized on the NewRelic dashboard