Skip to main content

General Guidelines for Pod Requests and Limits

To ensure the stability, scalability, and efficient use of resources within our Kubernetes cluster, it is important to follow specific guidelines when configuring resource requests and limits for pods. This section provides an a guideline to set up resource requests and limits for pods and to maintain an optimal cluster state.

Understand Resource Requests and Limits

  • Requests: The amount of CPU and memory guaranteed to a container. These are the minimum resources your application needs to run efficiently.

  • Limits: The maximum amount of CPU and memory a container is allowed to use. These prevent a single container from consuming excessive resources and impacting other applications.

Scaling and Resource Limits

  • Favor Scaling Over High Limits: Instead of setting high resource limits for a single pod, consider scaling up the number of replicas for your application. This approach distributes the load more evenly across the cluster and improves the application’s scalability and resilience.

  • Kubernetes Scheduler Limitations: The Kubernetes scheduler does not calculate limit values when scheduling pods. Therefore, limits cannot be used as a guarantee of resource availability. Limits serve to cap resource usage but do not ensure that the resources will always be available if the node is under heavy load.

Example Configuration Here is an example of setting resource requests and limits in a pod definition:

apiVersion: v1
kind: Pod
metadata:
  name: example-pod
spec:
  containers:
  - name: example-container
    image: example-image
    resources:
      requests:
        memory: "512Mi"
        cpu: "500m"
      limits:
        memory: "1Gi"
        cpu: "1000m"

Set Realistic Requests and Limits

  • Avoid Overprovisioning: Setting requests and limits too high can lead to resource wastage.

  • Avoid Underprovisioning: Setting requests and limits too low can lead to resource starvation and instability.

Monitor Resource Usage

  • Continuous Monitoring: Use Prometheus and Grafana to collect and visualize resource metrics. Monitor CPU and memory usage at the pod levels.

  • Defining Alerts: Configure alerts for high resource usage and pods running out of memory or CPU. Set up alerts for resource quota breaches.

  • Regular Reviews: Conduct regular audits of resource requests and limits to identify and rectify misconfigurations. Generate reports to review resource usage trends and adjust policies accordingly.

  • You can go to Cloud Platform Grafana and select your namespace to view the your pod resource usage.

  • You may refer to Cloud Platform Metrics and Dashboards for more information.

  • You can also use kubectl top to get a snapshot of resource usage for pods inside your namespace.

kubectl top -n <namespace>

Best Practices for Setting Resource Limits

Avoid Resource-Intensive and Non-Scalable Monolithic Workloads

  • Design your applications as microservices rather than monolithic applications. This allows better scalability and resource utilization by enabling individual services to scale independently based on their specific resource needs.

  • Decompose Monolithic Applications: Break down large, monolithic applications into smaller, manageable services that can be scaled independently. This decomposition improves fault isolation and simplifies management, as smaller services are easier to deploy and monitor.

  • Resource Efficiency: Ensure that each microservice requests only the resources it needs and scales independently based on its load. This prevents any single service from over-consuming resources and allows for more granular control over resource allocation.

  • Kubernetes Native: Utilize Kubernetes features like Horizontal Pod Autoscaler to scale your microservices based on CPU/memory usage automatically. HPA adjusts the number of pod replicas based on observed resource usage, ensuring efficient resource utilization.

Understand Your Workload

  • Analyze Consumption Patterns: Understand the resource consumption patterns of your application. Use historical data and monitoring tools to analyze CPU and memory usage over time, identifying peak usage periods and average consumption.

  • Performance Testing: Determine the CPU and memory requirements under typical and peak loads. Conduct load testing to simulate different usage scenarios and ensure your resource requests and limits are sufficient to handle expected traffic without overprovisioning.

Start with Resource Requests

  • Set Minimal Resources: Set requests based on the minimum resources your application needs to run efficiently. Requests ensure that the container gets at least the specified resources, which helps maintain application stability and performance.

Set Resource Limits

  • Define Maximum Resources: Set limits to define the maximum resources your application can use to prevent excessive resource consumption. Limits protect the cluster from a single container consuming too many resources, which could negatively impact other applications.

Avoid Overcommitting Resources

  • Resource Efficiency: Avoid setting resource limits too high compared to the actual needs of your application to ensure efficient utilization and prevent resource starvation for other pods. Use monitoring data to make informed decisions about resource allocation.

Use Autoscaling

  • Horizontal Pod Autoscaler (HPA): Automatically adjust the number of pods based on CPU or memory usage. HPA helps manage fluctuating workloads by scaling the number of pod replicas up or down in response to observed resource usage, ensuring optimal performance and resource utilization.

Monitor and Adjust

  • Continuous Adjustment: Monitor resource usage continuously using tools like Prometheus and Grafana. Adjust requests and limits based on observed usage and application performance to ensure that resources are used efficiently and that your application performs well under varying conditions.

Use CPU and Memory Limits Judiciously

  • CPU Limits: Set limits slightly higher than the average CPU usage to handle spikes without throttling. This prevents performance degradation during high-load periods.

  • Memory Limits: Set limits based on the worst-case memory usage to prevent Out-Of-Memory (OOM) kills. Ensuring that your application has enough memory to handle peak loads without exceeding its limits helps maintain stability.

Profile Different Environments

  • Environment-Specific Settings: Set different resource requests and limits for development, testing, and production environments. Ensure production settings are more stringent and aligned with real-world usage patterns to maintain performance and reliability in the live environment.

Cloud Platform team will regularly review the cluster resource usage. If we identify any unusual patterns of higher resource consumption, we will reach out to the respective teams to understand their workloads better and discuss any possible adjustments.

Feel free to contact us in #ask-cloud-platform on Slack if you have any concerns or questions.

This page was last reviewed on 27 June 2024. It needs to be reviewed again on 27 December 2024 by the page owner #cloud-platform .
This page was set to be reviewed before 27 December 2024 by the page owner #cloud-platform. This might mean the content is out of date.