Talk log from KubeCon Chicago

clux December 22, 2023 [software] #kubernetes

As time goes by, I find myself increasingly disinterested in actually travelling out to a convention, when my preferred method of consuming conference talks is overwhelmingly VOD form with 2x/FF potential.

This is doubly so when the convention is increasingly corporate (like KubeCon), and the CNCF youtube channel is in general high quality. Get it before your ad-blockers break.

This post contains a quick export from my personal notes on some significant talks at KubeConNA'23 (make of them what you will).

My interest areas this kubecon fall broadly into these categories (and will group talks by these):


How Prometheus Halved It's Memory Storage

nice journey into prometheus space optimization and how the layout of their labels matter a lot. great work that everyone who upgraded prometheus this year benefitted from.

Evaluating Observability Agent Performance

good overview of where the utilization overhead comes from and how to think in terms of backpressure.

OpenTelemetry: What's Next

they're graduated now and deal with metrics (but still experimental w.r.t. rust). imo it feels really strange to be shoehorning everything here into one agent, but we'll see what benefits it can bring later (they keep talking about consistent metadata between them).

Exploring the Power of Metrics Collection with OpenTelemetry

after the 45m+ workshop, they get into the horrific setup they've made inlining scrapeConfigs inside collector crds to support metrics. kind of look at this, and all i can think is "why tho?".

KEDA Graduation Announcements

looks increasingly nice. (i need to kill the prometheus-adapter garbage template format i currently deal with.)

keda_ metrics, scaling modifiers, pausing, job scaling, prom caching, all seem like very nice features.

All You Need to Know About Prometheus in 2023

very interesting talk if you are invested in this ecosystem.

they mention keep_firing_for landing which feels great (because it was my suggestion).

q/a session reveal they think the otel collector is an anti-pattern for metrics (ruins active monitoring) and collector's unconventional label use fucks with perf/predictability. worth keeping in mind if you consider using otel agent for metrics.

How and Why You Should Adopt and Expose OSS Interfaces Like Otel and Prometheus

This one was funny (to me, not actually funny). Google Monarch (their internal monitoring tool) exposing a promql interface so they can translate it to monarch.

lots of work for something people complain about so often (promql). goes to show that the thing people complain about is the thing people actually use.

Continuous Delivery

Flux 2.0 and Beyond; OCI + Cosign

OCI feels like a better distribution method for Kubernetes yaml than with helm, and flux's source-controller pulling oci packaged tarballs has a nice flow to it. OCI + Kustomization sounds workeable. their selling points:

personally, i just want to distance myself from helm upgrade/releases as much possible and this is a nice + efficient way of doing that.

Wolfi: Intro to the Linux Undistro

wolfi idea remains the same, and they got a lot of momentum behind it and chainguard. decent build system, but lots of yaml.

Keeping Helm Reliable, Stable, and Usable

kind of a boring talk, but kept it in here to highlight one point; helm is slowing down because it's the defacto standard.

so do not expect any new major features, WYSIWYG (including gotpl, toYaml | nindent, manual schemas)...


Security Showdown: The Overconfident Operator Vs the Nefarious...

entertaining and great talk about problems with wide access on laptops.

Arbitrary Code & File Execution in R/O FS – Am I Write?

nice exploit demo of readOnlyRootFileSystem and ways to bypass the ways it can be enabled. some truly horrendous and ugly reverse shell setups and /dev/termination-log abuse..

The Cluster Killer Bug: Learning API Priority and Fairness the Hard Way

nice intro to flowschemas, apipriority and fairness through a motivating bug example.

An accompanying talk would be Kubernetes DoS Protection at Google Scale.

RBACdoors: How Cryptominers Are Exploiting RBAC Misconfigs

hiding techniques for cryptominers if you ever had cluster admin access, and removed it later. decent talk.


Gateway API: The Most Collaborative API in Kubernetes History Is GA

this is a big deal. it's needed to get canaries everywhere with HTTPRoute, and it's cool to hear them talk about a mature rollout strategy for experimental crd fields.

tbh, it was more fun to hear this framed as a way to improve Service"the worst api in kubernetes"

UX Matters: Switching to GAMMA Without Ruining Your Reputation

Istio Past Present and Future

fun dig at the rust evangelists:

"rewrite it in rust? [..] no we want to be a lot better"

then talk about their single ambient mesh thing that still ends up with 39% cpu overhead 22% memory overhead.

maybe they should rewrite it in rust.

jkjk. this does seem like a nice improvement for them. ran istio a while back and found the overhead insane.

talk about the main working principles behind cilium and its mutual encryption. very interesting wireguard + spiffee setup. they had a lot of momentum behind them. let's see if that continues after Cisco buys them (i know how that works).

Demystifying Cilium: Learn How to Build an eBPF CNI Plugin from Scratch

while we are on the cilium train. great workshop about how a CNI can be built.


Building Better Controllers

with my kube-rs hat on, there are some cool ideas coming out of istio here (would be nice if they published it).

the ideas here could be partially implement yourself in your code, but it's interesting to see them commit to an interface like this at the controller level.

Node Size Matters - Running K8s as Cheaply as Possible

opencost and their metrics. does a nice investigation into their metrics and proving that small cloud provider instances are the most expensive instances you can use if you can fill your nodes.

Cutting Climate Costs with Kubernetes and CAPI

climate aware scheduler idea using watttime data, priorityclasses and KubeSchedulerConfiguration to allow only running workloads during "low emission times".

a later (much fluffier talk) shows how to integrate this with KEDA

What's up with Kubernetes Long Term Support?

mentions the numerous recent KEPs to improve stability:

they want a longer cycle (WG-LTS) - because currently >2/3rds of people have clusters out-of-support

this feels like a companion talk to Swimming with the current make it easy to stay up to date which also highlights how much regressions are a problem in Kubernetes (and mostly on patch releases..).

What's New with Kubectl and Kustomize

can't help lol when they say you shouldn't use most of the imperative subcommands; {create, run, expose, autoscale, replace, rollout undo, edit, set, patch, scale, ..}

Nix Kubernetes and the Pursuit of Reproducibility

building a nix hypervisor around libvirt with qcow2. pretty cool idea. nice sweetspot for nix in hypervisor space, because talos/flatcar/bottlerocket is probably better for VMs, and wolfi is better for docker images.

Pods and Circumstance: CRI-O Graduation Celebration

CRI-O metric setups with kubelet/cAdvisor and how they plan to optimize it.

they use conmon-rs - a rust lib for container runtime monitoring!

Grifts Ahoy! Bracing for the AI Tide

advocates for soft-AI as a tool to help generate useful context for small niche areas.

mentions we are likely on the top of the S curve (before the trough of disillusionment) and there's tons of hype and not much focus on risks, tons of exaggerations, whether it's better than non-AI, or whether it even uses AI at all - and it needs some laws.

Mentions a bunch of boring AI risks to consider (not the more crazy AI singularity transhumanist hype):

great talk.

Declarative Everything

My favourite talk about my favourite new feature in Kubernetes. Admission Validation and admission policies.

talked more about it on mastodon and ended up writing as a result

Safeguarding Clusters: Exploring the Benefits and Navigating the dangers of admission controllers

goes into details about footguns (failurePolicy, latency buildup, default timeout, scope), and some more exotic crazy failures if you accidentally block leases or flowschemas.

notes how it was hard to target negative selections until CEL; matchConditions can be CEL inside webhook configuration resource now!

15,000 Minecraft Players Vs One K8s Cluster. Who Wins?

very good talk about how a cloud minecraft provider moved from GCP to bare metal with "65% cost reduction".

nice to see the stuff they take advantage off (still off-load some stuff to cloud providers), and heavy use of the cluster api. MinIO and TopoLVM for storage is also very cool.

if nothing else, this talk is worth it for how they deal with lifecycle management of long lived games (how do you terminate the pods?). short answer; some automatic upgrades with cluster api, some planned maintenances with warnings and then very long terminationGracePeriodSeconds on SIGTERM with some advanced termination handlers.

Maintainer Track

Tools for Resolving Difficult Conflicts in Open Source Communities and Projects

imo the most useful thing herein is the highlighting of non-violent communication as a methodology for compassionate communication, proven to have good results (but best if practiced consistently).

The Eight Fallacies of Distributed Cloud Native Communities

similar maitainer points:

Kubeburned Out? How to get things done efficiently

good tips for contributing by making routines for contributions if employed (e.g. 1h before work, maybe every tuesday + thursday, 1-2h during weekend).

make your work public:

recognise people for their work;

say no - keep yourself healthy;

good advice, but then it's more about how to take on more responsibility within kubernetes, writing KEPs, and TAG work.

timing improves success;

find habits that feel right for you. if it's not high priority for you or who pays you, don't work on it.

Continuous Delivery