Mostly just a place where I dump interesting things I've learned that I'm pretty sure I'll forget.. It's my own personal knowledge base.

real-time kubernetes cost optimization with lumina and veneer

Published on 03 Apr 2026

About

We had a problem that I think most teams running Kubernetes on AWS at scale eventually hit: we were paying for Reserved Instances and Savings Plans, but we had no way to tell if we were actually using them. Worse, our node provisioner was actively launching spot instances while pre-paid capacity sat idle. I built two open-source tools to fix this — Lumina for cost visibility and Veneer for cost-aware provisioning.

The Visibility Problem

AWS Savings Plans and Reserved Instances are great for reducing compute costs — you commit to a certain spend level and get a discount in return. The catch is that the discount applies at the organization level, across all accounts, and the allocation logic is non-trivial.

When you want to know “what does this Kubernetes node actually cost me right now?”, the answer depends on:

Whether the instance type matches a Reserved Instance (exact type, exact AZ, same account)
Whether an EC2 Instance Savings Plan covers its instance family in that region
Whether a Compute Savings Plan has remaining capacity to cover it
How much of that capacity is already consumed by instances in other accounts
Whether the instance is spot (which gets spot pricing regardless of RI/SP commitments)

None of the existing tools gave me this. Kubecost and OpenCost use published on-demand rates or simple spot pricing — they don’t understand your organization’s Savings Plans allocation. The AWS Cost and Usage Reports (CUR) are comprehensive but delayed by hours, massive in size, and not something you can query in real time. The Cost Explorer API gives you aggregates, not per-instance breakdowns.

We also had multiple capacity managers in play — Karpenter, spot.io, and others — each launching instances across multiple accounts. There was no unified view of what anything actually cost after discounts.

Lumina: Seeing What You’re Actually Paying

Lumina is a Kubernetes controller I built that solves the visibility problem. It queries AWS across your entire organization — every account, every region — and builds a real-time picture of what each EC2 instance costs after all discounts are applied.

The core of it is a Savings Plans allocation algorithm that mirrors how AWS applies discounts, in strict priority order:

Spot instances get spot market pricing (no RI/SP applied)
Reserved Instances match first — exact instance type, exact AZ, same account
EC2 Instance Savings Plans apply next — specific instance family in a specific region
Compute Savings Plans apply last — any instance family, any region
On-Demand is whatever’s left

Lumina runs this allocation every few minutes and exposes Prometheus metrics for each instance. The two key metrics are ShelfPrice (what the instance would cost at on-demand rates) and EffectiveCost (what it actually costs after RI/SP coverage). The difference between those two numbers is money you’re saving — or if EffectiveCost equals ShelfPrice, money you’re leaving on the table.

The model is rate-based — it gives you $/hour snapshots of the current state rather than trying to replicate AWS’s cumulative billing. It’s an estimate, not a replacement for your AWS bill, but it’s accurate enough to drive real-time decisions. And that’s the point.

The Provisioning Problem

With Lumina running, I could see the problem clearly for the first time. I’d look at the metrics and find situations like this:

The m8i family had unused Savings Plans capacity at an effective rate of ~$0.13/hour for an m8i.xlarge
Karpenter was launching m8i.xlarge spot instances at ~$0.09/hour — except during high-demand periods when spot prices spike above $0.15/hour
During those spikes, we were paying more for spot than the Savings Plan rate we’d already committed to

Karpenter has become the de facto node provisioner for Kubernetes on AWS, but it has a well-known gap: it doesn’t understand Savings Plans or Reserved Instances. It uses published on-demand pricing and spot market rates to make decisions, with no awareness of your organization’s pre-paid commitments. This has been an open request for years, with related issues around RI/SP-aware provisioning and first-class Savings Plans support — but as of today, Karpenter still treats every on-demand instance as full price.

From Karpenter’s perspective, an on-demand m8i.xlarge costs $0.2117/hour and spot is ~$0.09/hour, so spot wins. Every time. It doesn’t know that your Savings Plan brings the effective on-demand rate down to $0.13/hour — which during a spot price spike is actually the cheaper option.

The result is that your Savings Plans sit partially utilized while you pay spot rates on top of them. You’re paying twice — once for the commitment you’re not using, and again for the spot instances you didn’t need.

Veneer: Closing the Loop

Veneer is the second piece — a Kubernetes controller I built that reads Lumina’s cost metrics from Prometheus and translates them into provisioning decisions.

It works through Karpenter’s NodeOverlay mechanism — an alpha feature that lets you adjust Karpenter’s simulated pricing for instance types. When Veneer sees that an instance family has available RI/SP capacity that makes on-demand cheaper than spot, it creates a NodeOverlay that reduces that instance family’s simulated price in Karpenter’s scheduling. Karpenter then naturally prefers it.

The key design constraints were:

Don’t over-subscribe. If there’s Savings Plans capacity for 10 more m8i.xlarge instances and Veneer has already steered Karpenter to launch 10, it removes the overlay. The next instance goes back to spot pricing, which is exactly what you want.

Don’t break Karpenter. Veneer prefers RI/SP-covered instances — it doesn’t require them. If spot capacity is available and RI/SP capacity is exhausted, Karpenter falls back to spot naturally. No NodePool changes, no hard constraints.

React in real time. As nodes launch and terminate, RI/SP utilization changes. Veneer watches these changes through Prometheus and adjusts NodeOverlays accordingly. A Savings Plan that was fully utilized 5 minutes ago might have capacity now because a large instance was terminated.

How They Work Together

The full pipeline looks like this:

AWS Organization (RIs, Savings Plans, Spot Prices, EC2 Instances)
    │
    ▼
Lumina (calculates per-instance effective costs)
    │
    ▼
Prometheus (stores cost metrics, utilization data)
    │
    ▼
Veneer (reads metrics, creates/updates/deletes NodeOverlays)
    │
    ▼
Karpenter (provisions nodes using adjusted pricing)

Lumina is the data layer — it answers “what does each instance actually cost?” Veneer is the action layer — it uses that data to steer provisioning toward the cheapest option, which might be on-demand when you have RI/SP coverage, or spot when you don’t.

You can run Lumina without Veneer if all you need is cost visibility — the Prometheus metrics are useful on their own for dashboards, alerting, and chargeback. Veneer is the optional next step that closes the feedback loop.

Final Thoughts

The underlying issue is a disconnect between how AWS bills you and how Kubernetes provisions compute. AWS applies discounts at the organization level using a complex priority system. Kubernetes provisioners use published pricing and have no idea your organization has pre-paid commitments. The result is predictable: you pay for capacity you don’t use.

Lumina and Veneer bridge that gap. I built Lumina to make the invisible visible — real costs, in real time, as Prometheus metrics. I built Veneer to act on that visibility and make sure pre-paid capacity gets used before reaching for spot.

Both projects are open source and available on GitHub:

Lumina — real-time Kubernetes cost visibility
Veneer — cost-aware Karpenter provisioning

real-time kubernetes cost optimization with lumina and veneer

About

The Visibility Problem

Lumina: Seeing What You’re Actually Paying

The Provisioning Problem

Veneer: Closing the Loop

How They Work Together

Final Thoughts

related posts

the dockerfile VOLUME instruction is a kubernetes footgun

managing multiple claude code profiles with claude-profile

solving the kubernetes node readiness problem with vigil

all tags