{"id":638,"date":"2019-08-22T19:27:59","date_gmt":"2019-08-22T19:27:59","guid":{"rendered":"https:\/\/www.peopleperhour.com\/engineering\/?p=638"},"modified":"2019-08-23T08:11:23","modified_gmt":"2019-08-23T08:11:23","slug":"kubernetes-upgrading-from-autoscaling-v2beta1-to-autoscaling-v2beta2","status":"publish","type":"post","link":"https:\/\/www.peopleperhour.com\/engineering\/2019\/08\/22\/kubernetes-upgrading-from-autoscaling-v2beta1-to-autoscaling-v2beta2\/","title":{"rendered":"Kubernetes upgrading from autoscaling\/v2beta1 to autoscaling\/v2beta2"},"content":{"rendered":"\n<p>We use a <code>HorizontalPodAutoscaler<\/code> Kubernetes resource to scale Pods that work off items from our AWS SQS queues. We found the scale-up to be very aggressive and wondered whether the new version would help. I couldn&#8217;t find any documention about the syntax change in v2beta2 <strong>for object metrics<\/strong>. Since I spent more than a hour working it out <a href=\"https:\/\/kubernetes.io\/docs\/reference\/generated\/kubernetes-api\/v1.13\/#metrictarget-v2beta2-autoscaling\">from the raw spec<\/a>, I thought I would put the changes here in case it helps anyone else.<\/p>\n\n\n\n<p>Before (v2beta1):<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">apiVersion: autoscaling\/v2beta1\n kind: HorizontalPodAutoscaler\n metadata:\n   name: pph-notifications-listener\n   namespace: default\n spec:\n   scaleTargetRef:\n     apiVersion: apps\/v1\n     kind: Deployment\n     name: pph-notifications-listener\n   minReplicas: 1\n   maxReplicas: 5\n   metrics:\n type: Object\n object:\n   metricName: redacted_qname_sqs_approximatenumberofmessages\n   target:\n     kind: Namespace\n     name: default\n   targetValue: 250 <\/pre>\n\n\n\n<p>After (v2beta2)<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">apiVersion: autoscaling\/v2beta2\n kind: HorizontalPodAutoscaler\n metadata:\n   name: pph-notifications-listener\n   namespace: default\n spec:\n   scaleTargetRef:\n     apiVersion: apps\/v1\n     kind: Deployment\n     name: pph-notifications-listener\n   minReplicas: 1\n   maxReplicas: 5\n   metrics:\n type: Object\n object:\n   metric:\n     name: redacted_qname_sqs_approximatenumberofmessages\n   describedObject:\n     kind: Namespace\n     name: default\n   target:\n     type: Value\n     value: 250 <\/pre>\n\n\n\n<p>The HPA gets these metrics from our Kubernetes custom Metrics API which gets them from Prometheus which gets them via a <a href=\"https:\/\/coreos.com\/blog\/the-prometheus-operator.html\">ServiceMonitor<\/a> from <a href=\"https:\/\/github.com\/jmal98\/sqs-exporter\">sqs-exporter<\/a> which gets them from CloudWatch. Simple!<\/p>\n\n\n\n<p>Look how aggressive the <code>v2beta1<\/code> scale-up is:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"495\" src=\"https:\/\/www.peopleperhour.com\/engineering\/wp-content\/uploads\/2019\/08\/aggressivescaleup-1024x495.jpg\" alt=\"\" class=\"wp-image-639\" srcset=\"https:\/\/www.peopleperhour.com\/engineering\/wp-content\/uploads\/2019\/08\/aggressivescaleup-1024x495.jpg 1024w, https:\/\/www.peopleperhour.com\/engineering\/wp-content\/uploads\/2019\/08\/aggressivescaleup-300x145.jpg 300w, https:\/\/www.peopleperhour.com\/engineering\/wp-content\/uploads\/2019\/08\/aggressivescaleup-768x371.jpg 768w, https:\/\/www.peopleperhour.com\/engineering\/wp-content\/uploads\/2019\/08\/aggressivescaleup.jpg 1852w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>See how the moment, the value goes over the <code>target<\/code>, Pods are scaled up to the max! The problem is that, because we use EKS, which is a managed service, and the kube-controller-manager runs on a master node, we can&#8217;t change some of the key settings like <code>--horizontal-pod-autoscaler-sync-period<\/code> or <code>--horizontal-pod-autoscaler-downscale-stabilization<\/code> (<a href=\"https:\/\/stackoverflow.com\/questions\/54525009\/change-the-horizontal-pod-autoscaler-sync-period-in-eks\">ref<\/a>).<\/p>\n\n\n\n<p><em>Update<\/em>: Unfortunately upgrading to <code>v2beta2<\/code> didn&#8217;t help with our aggressive scale-up problem:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"911\" src=\"https:\/\/www.peopleperhour.com\/engineering\/wp-content\/uploads\/2019\/08\/Selection_372-1024x911.jpg\" alt=\"\" class=\"wp-image-642\" srcset=\"https:\/\/www.peopleperhour.com\/engineering\/wp-content\/uploads\/2019\/08\/Selection_372-1024x911.jpg 1024w, https:\/\/www.peopleperhour.com\/engineering\/wp-content\/uploads\/2019\/08\/Selection_372-300x267.jpg 300w, https:\/\/www.peopleperhour.com\/engineering\/wp-content\/uploads\/2019\/08\/Selection_372-768x684.jpg 768w, https:\/\/www.peopleperhour.com\/engineering\/wp-content\/uploads\/2019\/08\/Selection_372.jpg 1547w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Figuring out this issue is important because it causes a surge in resource requests which causes our cluster size to grow and shrink needlessly and causes more Pod churn than necessary which makes observability harder and generates more logs than necessary. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>We use a HorizontalPodAutoscaler Kubernetes resource to scale Pods that work off items from our AWS SQS queues. We found the scale-up to be very aggressive and wondered whether the new version would help. I couldn&#8217;t find any documention about the syntax change in v2beta2&#8230;<\/p>\n","protected":false},"author":40,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[23],"tags":[44],"class_list":["post-638","post","type-post","status-publish","format-standard","hentry","category-devops-2","tag-kubernetes"],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p2CA4w-ai","jetpack_likes_enabled":true,"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.peopleperhour.com\/engineering\/wp-json\/wp\/v2\/posts\/638","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.peopleperhour.com\/engineering\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.peopleperhour.com\/engineering\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.peopleperhour.com\/engineering\/wp-json\/wp\/v2\/users\/40"}],"replies":[{"embeddable":true,"href":"https:\/\/www.peopleperhour.com\/engineering\/wp-json\/wp\/v2\/comments?post=638"}],"version-history":[{"count":3,"href":"https:\/\/www.peopleperhour.com\/engineering\/wp-json\/wp\/v2\/posts\/638\/revisions"}],"predecessor-version":[{"id":643,"href":"https:\/\/www.peopleperhour.com\/engineering\/wp-json\/wp\/v2\/posts\/638\/revisions\/643"}],"wp:attachment":[{"href":"https:\/\/www.peopleperhour.com\/engineering\/wp-json\/wp\/v2\/media?parent=638"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.peopleperhour.com\/engineering\/wp-json\/wp\/v2\/categories?post=638"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.peopleperhour.com\/engineering\/wp-json\/wp\/v2\/tags?post=638"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}