EKS Logs
Challenges with EKS Logs
Handling growing log volumes in Amazon EKS clusters is a critical challenge. Increasing log data from expanding applications leads to storage problems, higher costs, and operational hurdles. It also complicates issue identification and troubleshooting. Efficient log management is essential for ensuring a smooth and streamlined EKS cluster operation.
How to Solve them using Observo
Observo is our innovative solution designed to tackle the escalating EKS log challenge. It effortlessly collects logs from Amazon CloudWatch or other sources, processes and aggregates insights, and intelligently samples data. Users can seamlessly send logs to preferred destinations like Splunk or Elastic. With Observo, organizations can achieve a unified log view, simplifying issue analysis, performance optimization, and enhancing the operational experience in their Amazon EKS environments.
Case Study and Best Practices
To illustrate the effectiveness and recommended approaches for optimizing EKS logs with Observo, let's explore a practical case study. We've collected a week's worth of EKS control plane logs with a modest number of applications running. During our examination of log streams across different Kubernetes components, it became evident that a significant 98% of the log volume is attributed to a single log stream named kube-api-server-audit.
Here is a parsed log line of kube-api-server-audit
{
"kind":"Event",
"apiVersion":"audit.k8s.io/v1",
"level":"Metadata",
"auditID":"0790da55-dbde-430c-a5ab-a0583e80949b",
"stage":"ResponseComplete",
"requestURI":"/api/v1/namespaces/kube-system/configmaps/cp-vpc-resource-controller",
"verb":"update",
"user":{
"username":"eks:vpc-resource-controller",
"groups":[
"system:authenticated"
]
},
"sourceIPs":[
"172.16.46.254"
],
"userAgent":"controller/v0.0.0 (linux/amd64) kubernetes/$Format/leader-election",
"objectRef":{
"resource":"configmaps",
"namespace":"kube-system",
"name":"cp-vpc-resource-controller",
"uid":"53da4f12-b830-4ab0-a50b-c21e8d27c108",
"apiVersion":"v1",
"resourceVersion":"44582269"
},
"responseStatus":{
"metadata":{},
"code":200
},
"requestReceivedTimestamp":"2023-08-09T21:15:21.227177Z",
"stageTimestamp":"2023-08-09T21:15:21.236826Z",
"annotations":{
"authorization.k8s.io/decision":"allow",
"authorization.k8s.io/reason":"RBAC: allowed by RoleBinding \"eks-vpc-resource-controller-rolebinding/kube-system\" of Role \"eks-vpc-resource-controller-role\" to User \"eks:vpc-resource-controller\""
}
}We will systematically optimize these events, taking gradual steps guided by our analysis.
Reduce Logs
After examining nearly half a million of these events, we observed that nearly all of them feature a unique objectRef.resourceVersion. However, a significant number of events are grouped with same objectRef.name, objectRef.namespace, and objectRef.uid. This suggests that a single object underwent multiple modifications, generating new events with fresh resourceVersion values. Consequently, we can aggregate and summarize the resourceVersion field over a brief time period. The reduce transform feature in Observo can handle the aggregations mentioned previously.
Sample Lease
Within our sample logs, a significant 65% of the entries are attributed to Kubernetes lease events, which are vital for leader election and maintaining cluster reliability through heartbeat signals. However, retaining all lease events related to heartbeat and leader election may not be cost-effective, prompting consideration for sampling as a solution.
Observo offers the flexibility to sample events based on customizable rates and perform regex matching on any field, allowing precise control over the data selection process.
Sample Read Verb
In our case study, most log events revolve around three key verbs: get, watch, and list. Considering the potential data volume, it may not be necessary to retain all of them. To optimize storage and analysis, Observo's sample transform allows us to selectively capture events matching these verbs at a specified rate, ensuring efficiency in data management.
\
\
\
Last updated
Was this helpful?

