I Wrote Six Kubernetes Operators. Here's What Actually Matters.

Quick take

Building Kubernetes operators is straightforward. Building ones that don’t page you at 3am isn’t. After six operators in production at Decloud, here’s the controller-runtime code and design patterns that actually work. Includes Go examples you can steal.

We manage cloud infrastructure at Decloud. VMs, networks, storage volumes – all of it provisioned and lifecycle-managed through Kubernetes operators. Not because it’s trendy. Because our customers expect kubectl apply and then things just work.

I’ve written six operators over the past year and a half. Some are rock solid. Two of them cost me weekends. The difference was never the complexity of the domain. It was always the same three things: how I structured the reconciliation loop, how I designed the CRD, and whether I handled deletion properly.

This is the post I wish I’d had when I started.

What an operator actually is

Strip away the marketing and an operator is just a control loop. You define a custom resource (your API), you write a controller that watches it, and every time something changes, your controller tries to make reality match the spec.

That’s it. The entire pattern:

User writes CR spec -> Controller sees change -> Controller reconciles -> Updates status -> Waits

The “waits” part matters. Your controller isn’t a script that runs once. It runs forever. It gets called when things change, when things break, when Kubernetes restarts it, when someone fat-fingers a kubectl edit. If your reconciler can’t handle being called twice in a row with the same input and produce the same result, you’re going to have a bad time.

Setting up with controller-runtime

I use controller-runtime directly. Kubebuilder generates scaffolding on top of it, and that’s fine for getting started, but I prefer knowing exactly what’s under me. Less magic, fewer surprises when something breaks.

Here’s the skeleton. I’ll use a simplified version of one of our Decloud operators – a CloudVolume that provisions storage on a provider API and attaches it to a node.

First, the types:

package v1alpha1

import metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"

type CloudVolumeSpec struct {
    SizeGB   int    `json:"sizeGB"`
    Region   string `json:"region"`
    NodeName string `json:"nodeName"`
}

type CloudVolumeStatus struct {
    VolumeID   string `json:"volumeID,omitempty"`
    Phase      string `json:"phase,omitempty"`
    AttachedTo string `json:"attachedTo,omitempty"`
}

// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
type CloudVolume struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`
    Spec              CloudVolumeSpec   `json:"spec,omitempty"`
    Status            CloudVolumeStatus `json:"status,omitempty"`
}

Small spec. Obvious fields. No map[string]interface{} anywhere. I can’t stress this enough: the CRD is your product interface. If a user has to read your source code to figure out what to put in the spec, you’ve already lost.

The reconciliation loop

Here’s the real thing. Not a toy example. This is close to what runs in our production clusters, simplified for clarity but structurally identical.

func (r *CloudVolumeReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    log := r.Log.WithValues("cloudvolume", req.NamespacedName)

    var vol v1alpha1.CloudVolume
    if err := r.Get(ctx, req.NamespacedName, &vol); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // Handle deletion first. Always.
    if !vol.DeletionTimestamp.IsZero() {
        return r.reconcileDelete(ctx, &vol)
    }

    // Ensure finalizer is present
    if !containsFinalizer(vol.Finalizers, finalizerName) {
        vol.Finalizers = append(vol.Finalizers, finalizerName)
        if err := r.Update(ctx, &vol); err != nil {
            return ctrl.Result{}, err
        }
        return ctrl.Result{Requeue: true}, nil
    }

    // Provision if we don't have a volume ID yet
    if vol.Status.VolumeID == "" {
        return r.reconcileProvision(ctx, log, &vol)
    }

    // Attach if not attached
    if vol.Status.AttachedTo != vol.Spec.NodeName {
        return r.reconcileAttach(ctx, log, &vol)
    }

    // Steady state. Check back in 5 minutes.
    return ctrl.Result{RequeueAfter: 5 * time.Minute}, nil
}

Notice the structure. Deletion check first, then finalizer setup, then provision, then attach, then idle. Each branch returns early. No nested ifs six levels deep. No giant switch statement. The reconciler reads top to bottom like a decision tree, and every path is idempotent.

This structure isn’t clever. It’s boring. That’s the point.

The provision step

func (r *CloudVolumeReconciler) reconcileProvision(
    ctx context.Context,
    log logr.Logger,
    vol *v1alpha1.CloudVolume,
) (ctrl.Result, error) {

    log.Info("provisioning volume", "sizeGB", vol.Spec.SizeGB, "region", vol.Spec.Region)

    volumeID, err := r.Provider.CreateVolume(ctx, vol.Spec.SizeGB, vol.Spec.Region)
    if err != nil {
        log.Error(err, "failed to provision volume")
        r.setPhase(ctx, vol, "ProvisionFailed")
        // Don't return the error. Requeue with backoff instead.
        return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
    }

    vol.Status.VolumeID = volumeID
    r.setPhase(ctx, vol, "Provisioned")

    if err := r.Status().Update(ctx, vol); err != nil {
        return ctrl.Result{}, err
    }

    log.Info("volume provisioned", "volumeID", volumeID)
    return ctrl.Result{Requeue: true}, nil
}

Two things I learned the hard way here.

Don’t return errors from external API calls. If you return ctrl.Result{}, err, controller-runtime uses exponential backoff, which is fine. But for provider API failures, I want explicit control over retry timing. A provider outage shouldn’t hammer their API with exponential retries that max out at 16 minutes. I return RequeueAfter with a fixed delay and handle the error myself.

Update status immediately after the side effect. If I provision the volume but crash before writing the status, the next reconciliation will call CreateVolume again. That’s why our provider client is idempotent – it checks if a volume with the same labels already exists and returns it. But the status update should still happen as fast as possible to minimize the window.

Deletion and finalizers

This is where most operators break. I see people skip finalizers because “it’s just Kubernetes resources, they’ll get garbage collected.” Sure, for in-cluster stuff. But we’re provisioning cloud resources. If someone deletes the CloudVolume CR and we don’t clean up the actual volume, we’re leaking money.

const finalizerName = "cloudvolume.decloud.dev/cleanup"

func (r *CloudVolumeReconciler) reconcileDelete(
    ctx context.Context,
    vol *v1alpha1.CloudVolume,
) (ctrl.Result, error) {
    log := r.Log.WithValues("cloudvolume", vol.Name)

    if !containsFinalizer(vol.Finalizers, finalizerName) {
        return ctrl.Result{}, nil
    }

    if vol.Status.VolumeID != "" {
        log.Info("deleting external volume", "volumeID", vol.Status.VolumeID)

        err := r.Provider.DeleteVolume(ctx, vol.Status.VolumeID)
        if err != nil && !isNotFound(err) {
            log.Error(err, "failed to delete external volume")
            return ctrl.Result{RequeueAfter: 15 * time.Second}, nil
        }
    }

    // Remove finalizer
    vol.Finalizers = removeFinalizer(vol.Finalizers, finalizerName)
    if err := r.Update(ctx, vol); err != nil {
        return ctrl.Result{}, err
    }

    log.Info("cleanup complete")
    return ctrl.Result{}, nil
}

The isNotFound check matters. If the volume was already deleted externally (someone cleaned it up in the cloud console, provider auto-deleted it, whatever), we still need to remove the finalizer. Without that check, the CR gets stuck in Terminating forever. Ask me how I know.

Wiring it up

The main setup is straightforward. This is where you tell controller-runtime what to watch and how to build the manager.

func main() {
    var metricsAddr string
    flag.StringVar(&metricsAddr, "metrics-addr", ":8080", "metrics endpoint")
    flag.Parse()

    mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
        Scheme:             scheme,
        MetricsBindAddress: metricsAddr,
        Port:               9443,
        LeaderElection:     true,
        LeaderElectionID:   "cloudvolume-controller",
    })
    if err != nil {
        setupLog.Error(err, "unable to start manager")
        os.Exit(1)
    }

    if err := (&CloudVolumeReconciler{
        Client:   mgr.GetClient(),
        Log:      ctrl.Log.WithName("cloudvolume"),
        Scheme:   mgr.GetScheme(),
        Provider: provider.NewClient(),
    }).SetupWithManager(mgr); err != nil {
        setupLog.Error(err, "unable to create controller")
        os.Exit(1)
    }

    if err := mgr.Start(ctrl.SetupSignalHandler()); err != nil {
        setupLog.Error(err, "problem running manager")
        os.Exit(1)
    }
}

LeaderElection: true – don’t skip this. Without it, running two replicas means two controllers fighting over the same resources. We had this bug for exactly one deploy before someone noticed duplicate volumes showing up.

Testing operators

Testing operators is painful. I won’t pretend otherwise. But envtest from controller-runtime makes it tolerable. It spins up a real API server and etcd in-process. No mocks for the Kubernetes API layer.

func TestReconcileProvision(t *testing.T) {
    env := &envtest.Environment{
        CRDDirectoryPaths: []string{filepath.Join("..", "config", "crd", "bases")},
    }
    cfg, err := env.Start()
    require.NoError(t, err)
    defer env.Stop()

    k8sClient, err := client.New(cfg, client.Options{Scheme: scheme})
    require.NoError(t, err)

    // Create a CloudVolume
    vol := &v1alpha1.CloudVolume{
        ObjectMeta: metav1.ObjectMeta{
            Name:      "test-vol",
            Namespace: "default",
        },
        Spec: v1alpha1.CloudVolumeSpec{
            SizeGB:   100,
            Region:   "eu-west-1",
            NodeName: "worker-1",
        },
    }
    require.NoError(t, k8sClient.Create(context.TODO(), vol))

    // Run reconciler with a fake provider
    reconciler := &CloudVolumeReconciler{
        Client:   k8sClient,
        Log:      ctrl.Log.WithName("test"),
        Scheme:   scheme,
        Provider: &fakeProvider{volumeID: "vol-abc123"},
    }

    result, err := reconciler.Reconcile(context.TODO(), ctrl.Request{
        NamespacedName: types.NamespacedName{Name: "test-vol", Namespace: "default"},
    })
    require.NoError(t, err)
    assert.True(t, result.Requeue)

    // Verify status was updated
    var updated v1alpha1.CloudVolume
    require.NoError(t, k8sClient.Get(context.TODO(), types.NamespacedName{
        Name: "test-vol", Namespace: "default",
    }, &updated))
    assert.Equal(t, "vol-abc123", updated.Status.VolumeID)
}

Mock the external provider, not Kubernetes. The fake provider interface is trivial – CreateVolume, DeleteVolume, AttachVolume. Maybe five methods total. This gives you fast, reliable tests that catch real API interaction bugs.

Unit tests for pure logic (like “given this spec and this status, what phase should we be in?”) are even simpler. Regular table-driven Go tests. No envtest needed.

Mistakes I’ve made (so you don’t have to)

Mutating spec in the reconciler. I did this once to “normalize” a field. Created an infinite reconcile loop because every spec update triggered a new event. Took me an embarrassingly long time to figure out why CPU was pinned at 100%.

Not watching owned resources. Our volume operator creates a PersistentVolume as a downstream resource. If someone manually deletes the PV, the operator needs to know. Forgot to add it to the watch and spent a day debugging why volumes were “attached” but pods couldn’t mount them.

func (r *CloudVolumeReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&v1alpha1.CloudVolume{}).
        Owns(&corev1.PersistentVolume{}).  // Watch PVs we create
        Complete(r)
}

Writing status before the side effect completes. Optimistic status updates sound reasonable until your provider call fails after you already told the world the volume exists. Then you have phantom resources in your status that don’t exist anywhere. Always: do the thing, then update the status.

No rate limiting on requeue. Early versions of our operator requeued immediately on every provider error. During a 30-minute cloud outage, we sent thousands of requests to a failing API. Our provider rate-limited us, which made the recovery take even longer. Fixed it by adding RequeueAfter: 30 * time.Second minimum for any external call failure.

When not to write an operator

Operators are the wrong tool when the thing you’re managing doesn’t have meaningful lifecycle beyond deploy-and-forget. A stateless web app with a Deployment, Service, and Ingress? Helm chart. Done.

I write an operator when:

The thing has external state that Kubernetes doesn’t know about
Day-2 ops involve multi-step coordination (backup, failover, resize)
Humans currently follow a runbook that involves kubectl and another API

If none of those apply, you’re adding complexity for no reason. And complexity in controllers is expensive because it runs 24/7 and fails silently.

Where we’re now

Six operators. Three manage cloud resources (volumes, networks, VMs). Two handle internal platform concerns (cert rotation, DNS). One manages database provisioning. The three cloud operators are the most complex because external APIs are unreliable and slow. The internal ones are surprisingly simple once you get the pattern down.

Total lines of Go across all six: roughly 12,000. Most of that’s tests. The actual reconciler logic is small in every case. If your reconciler is more than 200-300 lines, you’re probably doing too much in one controller.

The operator pattern works. But it demands discipline. Idempotent reconciliation. Small CRDs. Defensive deletion. Boring code that a sleepy on-call engineer can read at 3am and understand what went wrong.

That last part is the real test.