GKE Nodepool Add Labels Without overwriting existing labels

GKE has a feature to add node labels to all nodes in the nodepool. GKE will add the label to both the nodes already running in the cluster and also to newly added nodes.

You can use the feature like this:

gcloud container node-pools update my-node-pool \
  --cluster my-cluster --labels sam …

Continue reading »

GKE list tainted nodepools with a specific taint

A use case for upgrades involved being able to list all the node pools that have scaled down back to 0 and have a specific taint. This blog post shows the commands you can use to get this information.

List the GKE nodepools that have been tainted with key=upgrade …

Continue reading »

3 tips for GKE ML/batch workloads

There has been an influx of large batch and ML training workloads on GKE. I've personally had the please of working with one of those workloads. The things that batch and ML workload often require from GKE are the following:

  • Minimize pod disruptions since pods often can't simply be restarted …

Continue reading »

GKE Safely Drain a Nodepool without pod disruptions

GKE/K8s wasn't originally designed for workloads that spin up single pods and want those pods to stay up and running on the same node for very time. That doesn't mean those kind of workloads aren't running on GKE. In fact, there are large GKE ML/batch platform workloads running …

Continue reading »

Deploying a Weaviate cluster on GKE

Weaviate has great docs on how to deploy on K8s using Helm, however this guide is specifically focused on an end-to-end deployment of Weaviate on GKE with replication turned on. The following topics will be covered:

  • Creating and configuring your GKE cluster
  • Deploying Weaviate with Helm
  • Tweaking the Weaviate helm …

Continue reading »

GKE GPU timesharing and resource quotas experiment

You only got a few GPUs and want to pretend to your end-users that you got many? Then GKE GPU timesharing might just be the feature for you to save costs on GPUs that are underutilized. In this blog post you will learn:

  1. Creating a GKE nodepool with timesharing enabled …

Continue reading »

GKE move system services (kube-dns, calico) to dedicated nodepool

GKE by default deploys kube-dns and other system services to any of your nodepools. This is probably fine for most cases, but certain use cases might require preventing system services from running on the same nodes as your where your applications are running. This blog post provides instructions on how …

Continue reading »

GKE docker registry with HTTP proxy

You are at one of those places that requires you to use a proxy to access your company wide Docker registry. Sometimes HTTP proxies are used to supposedly improve security or to workaround IP based rate limits. Well good luck, you're in for a ride on how to do this …

Continue reading »

GKE custom OSS K8s cluster autoscaler

Update 2023-03-27: Added instructions for clusters using Workload Identity

This blog post described how to deploy your own K8s cluster autoscaler instead of the cluster autoscaler that's bundled with GKE. This can be helpful in the rare case that the bundled GKE cluster autoscaler doesn't work for you.

Note that …

Continue reading »

Custom DNS entry with KubeDNS stubdomain

An example use case that I've seen is where you have a K8s service exposed on the ClusterIP and you want to make that service accessible over a domain name that you don't control.

You can do to the following steps to set this up:

  1. Deploy CoreDNS with custom DNS …

Continue reading »