Talk log from KubeCon Chicago
clux December 22, 2023 [software] #kubernetesAs time goes by, I find myself increasingly disinterested in actually travelling out to a convention, when my preferred method of consuming conference talks is overwhelmingly VOD form with 2x/FF potential.
This is doubly so when the convention is increasingly corporate (like KubeCon), and the CNCF youtube channel is in general high quality. Get it before your ad-blockers break.
This post contains a quick export from my personal notes on some significant talks at KubeConNA'23 (make of them what you will).
My interest areas this kubecon fall broadly into these categories (and will group talks by these):
- observability related :: maintain a lot of metrics related tooling
- continuous deployment :: maintain a lot of ad-hoc cd tooling
- security :: maintain a bunch of controllers and ad-hoc validations
- networking :: traffic apis, mesh and other network sledgehammers (using occasionally)
- kubernetes :: everything else.. cool to see where the platform i build on top of end up going
- maintainership :: am trying to scale kube-rs beyond myself
Observability
How Prometheus Halved It's Memory Storage
nice journey into prometheus space optimization and how the layout of their labels matter a lot. great work that everyone who upgraded prometheus this year benefitted from.
Evaluating Observability Agent Performance
good overview of where the utilization overhead comes from and how to think in terms of backpressure.
OpenTelemetry: What's Next
they're graduated now and deal with metrics (but still experimental w.r.t. rust). imo it feels really strange to be shoehorning everything here into one agent, but we'll see what benefits it can bring later (they keep talking about consistent metadata between them).
Exploring the Power of Metrics Collection with OpenTelemetry
after the 45m+ workshop, they get into the horrific setup they've made inlining scrapeConfigs inside collector crds to support metrics. kind of look at this, and all i can think is "why tho?".
KEDA Graduation Announcements
looks increasingly nice. (i need to kill the prometheus-adapter garbage template format i currently deal with.)
keda_ metrics, scaling modifiers, pausing, job scaling, prom caching, all seem like very nice features.
All You Need to Know About Prometheus in 2023
very interesting talk if you are invested in this ecosystem.
they mention keep_firing_for
landing which feels great (because it was my suggestion).
q/a session reveal they think the otel collector is an anti-pattern for metrics (ruins active monitoring) and collector's unconventional label use fucks with perf/predictability. worth keeping in mind if you consider using otel agent for metrics.
How and Why You Should Adopt and Expose OSS Interfaces Like Otel and Prometheus
This one was funny (to me, not actually funny). Google Monarch (their internal monitoring tool) exposing a promql interface so they can translate it to monarch.
lots of work for something people complain about so often (promql). goes to show that the thing people complain about is the thing people actually use.
Continuous Delivery
Flux 2.0 and Beyond; OCI + Cosign
OCI feels like a better distribution method for Kubernetes yaml than with helm, and flux's source-controller
pulling oci packaged tarballs has a nice flow to it. OCI + Kustomization sounds workeable.
their selling points:
- colocation of artifacts + images + signatures
- passwordless auth + keyless integrity verification
- increased cd/flux controller efficiency
personally, i just want to distance myself from helm upgrade/releases as much possible and this is a nice + efficient way of doing that.
Wolfi: Intro to the Linux Undistro
wolfi idea remains the same, and they got a lot of momentum behind it and chainguard. decent build system, but lots of yaml.
Keeping Helm Reliable, Stable, and Usable
kind of a boring talk, but kept it in here to highlight one point; helm is slowing down because it's the defacto standard.
so do not expect any new major features, WYSIWYG (including gotpl, toYaml | nindent, manual schemas)...
Security
Security Showdown: The Overconfident Operator Vs the Nefarious...
entertaining and great talk about problems with wide access on laptops.
Arbitrary Code & File Execution in R/O FS – Am I Write?
nice exploit demo of readOnlyRootFileSystem
and ways to bypass the ways it can be enabled.
some truly horrendous and ugly reverse shell setups and /dev/termination-log
abuse..
The Cluster Killer Bug: Learning API Priority and Fairness the Hard Way
nice intro to flowschemas, apipriority and fairness through a motivating bug example.
An accompanying talk would be Kubernetes DoS Protection at Google Scale.
RBACdoors: How Cryptominers Are Exploiting RBAC Misconfigs
hiding techniques for cryptominers if you ever had cluster admin access, and removed it later. decent talk.
Networking
Gateway API: The Most Collaborative API in Kubernetes History Is GA
this is a big deal. it's needed to get canaries everywhere with HTTPRoute
, and it's cool to hear them talk about a mature rollout strategy for experimental crd fields.
tbh, it was more fun to hear this framed as a way to improve Service
"the worst api in kubernetes"
UX Matters: Switching to GAMMA Without Ruining Your Reputation
- linkerd on working with the gateway api and issues and lessons with policy controller
- presents rust in a way that's "because it's cool" but a bit manual atm. (..i'll take it)
- mentioned their leader election impl and kubert. candid and moderately intersting.
Istio Past Present and Future
fun dig at the rust evangelists:
"rewrite it in rust? [..] no we want to be a lot better"
then talk about their single ambient mesh thing that still ends up with 39% cpu overhead 22% memory overhead.
maybe they should rewrite it in rust.
jkjk. this does seem like a nice improvement for them. ran istio a while back and found the overhead insane.
When Is a Secure Connection Not Encrypted? and Other Stories
talk about the main working principles behind cilium
and its mutual encryption.
very interesting wireguard + spiffee setup.
they had a lot of momentum behind them. let's see if that continues after Cisco buys them (i know how that works).
Demystifying Cilium: Learn How to Build an eBPF CNI Plugin from Scratch
while we are on the cilium train. great workshop about how a CNI can be built.
Kubernetes
Building Better Controllers
with my kube-rs hat on, there are some cool ideas coming out of istio here (would be nice if they published it).
the ideas here could be partially implement yourself in your code, but it's interesting to see them commit to an interface like this at the controller level.
Node Size Matters - Running K8s as Cheaply as Possible
opencost and their metrics. does a nice investigation into their metrics and proving that small cloud provider instances are the most expensive instances you can use if you can fill your nodes.
Cutting Climate Costs with Kubernetes and CAPI
climate aware scheduler idea using watttime data, priorityclasses and KubeSchedulerConfiguration
to allow only running workloads during "low emission times".
a later (much fluffier talk) shows how to integrate this with KEDA
What's up with Kubernetes Long Term Support?
mentions the numerous recent KEPs to improve stability:
- KEP-1333 - 1.19+ ensures all APIs required to run clusters are GA
- KEP-1693 - New APIs are not allowed to be reqired until they graduate to GA
- 1.19+ has metrics for deprecated resource use
- KEP-1194 - KEPs require better reviews w.r.t. production readiness
- Deprecation policy updated to make stable api versions permanent
- KEP-3136 - New unstable APIs are OFF by default
- KEP-3744 - Kubernetes 1.23+ use supported go versions (easier to bump security fixes going forward)
- KEP-3935 - Kubernetes 1.28+ control plane nodes support n-3 version skew (annual upgrade now ok) otherwise talks about difficulties of widening support window too much (upgrade complexity, ecosystem dependencies, bugfix porting)
they want a longer cycle (WG-LTS
) - because currently >2/3rds of people have clusters out-of-support
this feels like a companion talk to Swimming with the current make it easy to stay up to date which also highlights how much regressions are a problem in Kubernetes (and mostly on patch releases..).
What's New with Kubectl and Kustomize
kubectl diff --prune
had a selector bug still...kubectl auth whoami
new
can't help lol when they say you shouldn't use most of the imperative subcommands; {create, run, expose, autoscale, replace, rollout undo, edit, set, patch, scale, ..}
Nix Kubernetes and the Pursuit of Reproducibility
building a nix hypervisor around libvirt
with qcow2. pretty cool idea.
nice sweetspot for nix in hypervisor space, because talos/flatcar/bottlerocket is probably better for VMs, and wolfi is better for docker images.
Pods and Circumstance: CRI-O Graduation Celebration
CRI-O metric setups with kubelet/cAdvisor and how they plan to optimize it.
they use conmon-rs - a rust lib for container runtime monitoring!
Grifts Ahoy! Bracing for the AI Tide
advocates for soft-AI as a tool to help generate useful context for small niche areas.
mentions we are likely on the top of the S curve (before the trough of disillusionment) and there's tons of hype and not much focus on risks, tons of exaggerations, whether it's better than non-AI, or whether it even uses AI at all - and it needs some laws.
Mentions a bunch of boring AI risks to consider (not the more crazy AI singularity transhumanist hype):
- hard to control risks when you don't know what the model is actually doing
- research has shown it's very cheap to poison a model (60$ for data control of 0.1% - buying expired domains)
- can be used against us; malware creations (easy to bypass protections by lieing to it)
- deepfakes (passed a liveness test and scammed shanghai tax system)
- ai often hallucinates package names (can typosquat those)
great talk.
Declarative Everything
My favourite talk about my favourite new feature in Kubernetes. Admission Validation and admission policies.
talked more about it on mastodon and ended up writing kube.rs/admission as a result
Safeguarding Clusters: Exploring the Benefits and Navigating the dangers of admission controllers
goes into details about footguns (failurePolicy
, latency buildup, default timeout, scope), and some more exotic crazy failures if you accidentally block leases or flowschemas.
notes how it was hard to target negative selections until CEL; matchConditions can be CEL inside webhook configuration resource now!
15,000 Minecraft Players Vs One K8s Cluster. Who Wins?
very good talk about how a cloud minecraft provider moved from GCP to bare metal with "65% cost reduction".
nice to see the stuff they take advantage off (still off-load some stuff to cloud providers), and heavy use of the cluster api. MinIO and TopoLVM for storage is also very cool.
if nothing else, this talk is worth it for how they deal with lifecycle management of long lived games (how do you terminate the pods?).
short answer; some automatic upgrades with cluster api, some planned maintenances with warnings and then very long terminationGracePeriodSeconds
on SIGTERM
with some advanced termination handlers.
Maintainer Track
Tools for Resolving Difficult Conflicts in Open Source Communities and Projects
imo the most useful thing herein is the highlighting of non-violent communication as a methodology for compassionate communication, proven to have good results (but best if practiced consistently).
The Eight Fallacies of Distributed Cloud Native Communities
similar maitainer points:
- maintainer bw is (not) infinite :: lack of control + lack of empathy towards you = recipe for burnout
- compromise is (not) a rarity and (not) the norm :: everyone has their own agenda
- cost of contributor onboarding is (not) zero :: maintainer bw is not infinite, ownership also hindered by undocumented context => episodic maintainers leave. need to put concious effort into uplifting and growing existing contributors to avoid gridlock.
- staffing across areas is (not) homogeneous :: some hard areas have very few people who knows what is going on
Kubeburned Out? How to get things done efficiently
good tips for contributing by making routines for contributions if employed (e.g. 1h before work, maybe every tuesday + thursday, 1-2h during weekend).
make your work public:
- always communicate pr status
- ask for help if stuck
- unasign if you cannot work on it, maybe add snippets that may help others
recognise people for their work;
- blog posts are good
- celebrate small achievements
say no - keep yourself healthy;
- take breaks, short or long
good advice, but then it's more about how to take on more responsibility within kubernetes, writing KEPs, and TAG work.
timing improves success;
- propose things at the right point in time
- raise questions at the right point in time (so you're likely to get the right answer)
- establish an async schedule more likely to get people working together with you
find habits that feel right for you. if it's not high priority for you or who pays you, don't work on it.