The Problem: When Self-Managed Kubernetes Becomes the Bottleneck

3E provides a SaaS platform for monitoring and analytics services. As the company grew and customer demand increased, their infrastructure was put under pressure in ways that a self-managed Kubernetes cluster was never designed to handle at scale without dedicated platform engineering resources.

The core issue was availability. Their self-hosted Kubernetes cluster was hosting mission-critical services, and recurring incidents were compromising the reliability their customers depended on. Overloads, unexpected failures, and the operational burden of maintaining the control plane were consuming engineering time that should have been spent on product development.

The situation was not theoretical. Service disruptions were happening with increasing frequency. The team was firefighting cluster-level issues instead of building features. And because the cluster was self-managed, every incident required deep Kubernetes expertise to diagnose and resolve, expertise that was being spread thinner with each passing month.

"We had chosen Kubernetes because it was the right platform for our workloads. But managing it ourselves was creating exactly the kind of operational drag we had been trying to avoid."

3E Engineering Team

Why Out.Cloud

3E needed a partner who could move fast and move confidently. Given the severity of the availability issues, there was no room for a prolonged discovery phase or an experimental migration strategy. They needed engineers who had done this before, under similar constraints, and who could execute without introducing new risks.

Out.Cloud was selected for three reasons:

  • Proven track record migrating Kubernetes clusters from various distributions and vendors to managed services
  • Experienced and certified engineers across Kubernetes, AWS, and infrastructure-as-code tooling
  • Comprehensive service scope covering migration execution, ongoing cluster administration, and cluster security aligned with industry best practices

This was not a consulting engagement that would produce a report and a roadmap. It was a hands-on migration with a clear objective: get 3E onto a stable, managed Kubernetes platform as quickly and safely as possible.

Our Approach: IaC-First, Incremental Migration

The standard approach to a Kubernetes migration, tear down the old cluster, build the new one, migrate everything at once, was not an option here. 3E's services were mission-critical. Downtime during migration was not acceptable. We needed an approach that was both methodical and safe.

Infrastructure as Code Foundation

Our first step was to provision the new EKS-based cluster entirely through Infrastructure as Code. This was non-negotiable. A manually provisioned cluster would simply recreate the operational risks 3E was trying to escape. Every component of the new infrastructure, from the VPC configuration to the node groups to the IAM policies, was defined in code, version-controlled, and repeatable.

The IaC approach gave us three critical advantages:

  • Reproducibility: The entire cluster could be rebuilt from scratch in minutes, not days
  • Auditability: Every infrastructure change was tracked, reviewed, and versioned
  • Confidence: The team could make changes to infrastructure knowing that rollback was always available

The Incremental Migration Strategy

With the EKS cluster provisioned and validated, we moved to the migration itself. Rather than a big-bang cutover, we followed an incremental migration model: workloads were moved from the self-managed cluster to EKS one service at a time, with validation at each step.

This approach allowed us to:

  • Validate each workload's behaviour on EKS before moving the next
  • Identify and resolve compatibility issues in isolation
  • Maintain full service availability throughout the migration window
  • Build confidence progressively, both in the platform and in the team's ability to operate it
~0 Cluster incidents post-migration
EKS AWS-managed control plane
Lower OpEx costs vs. self-managed

The Target Architecture

The final EKS architecture was designed around the principles of managed simplicity, operational visibility, and cost efficiency. The key architectural decisions were:

  • AWS-managed control plane: EKS handles the Kubernetes API server, etcd, and all control plane components. 3E's team no longer needs to patch, upgrade, or troubleshoot control plane failures
  • Managed node groups: Worker nodes are provisioned and managed through EKS managed node groups, with automatic scaling configured to handle workload fluctuations
  • IaC-defined networking: VPC, subnets, security groups, and ingress configuration are all defined in code, ensuring consistency across environments
  • Security best practices: IAM roles for service accounts (IRSA), network policies, and pod security standards are enforced by default

The IaC-provisioned EKS architecture, from Terraform definitions to production workloads, with security and observability built in.

Execution: Fast, Reliable, Zero Disruption

Given the severity of the availability issues 3E was experiencing, speed was critical. But speed without reliability would have simply traded one set of problems for another. Our execution model was designed to be both fast and safe.

The migration followed a structured sequence:

  1. Infrastructure provisioning: The EKS cluster, networking, and supporting services were provisioned entirely through IaC. This phase established the foundation and was completed before any workload migration began.
  2. Validation and hardening: The new cluster was stress-tested and validated against 3E's specific workload patterns. Security configurations were verified, scaling policies were tuned, and monitoring was established.
  3. Incremental workload migration: Services were migrated one at a time from the self-managed cluster to EKS. Each migration was followed by a stabilisation period where the service was monitored for anomalies.
  4. Decommissioning: Once all workloads were running on EKS and stability was confirmed, the self-managed cluster was decommissioned.

Throughout the process, 3E's services remained available. There was no maintenance window, no "planned downtime", and no service degradation. The migration happened around the running workloads, not instead of them.

Outcomes: A Platform That Works for the Team

The results of the migration were immediate and measurable. But the most important outcome was not any single metric. It was the shift in how 3E's engineering team spent their time.

Operational Stability

With AWS managing the control plane, the recurring cluster incidents that had plagued the self-managed setup were reduced to virtually zero. The engineering team was no longer spending hours diagnosing control plane failures or recovering from unexpected cluster behaviour. The platform was stable, and it stayed stable.

Scalability and Availability

EKS's automatic scaling capabilities meant the cluster could respond to workload fluctuations without manual intervention. Services that had previously been affected by overloads were now running with improved availability and the headroom to handle demand spikes without service degradation.

Cost Optimisation

Counter-intuitively, the managed service proved more cost-effective than the self-managed cluster. The reduction in operational overhead, the elimination of incident response time, and the more efficient resource utilisation through auto-scaling all contributed to lower overall OpEx costs.

"The move to EKS gave us back the time we were losing to cluster maintenance. Our engineers can focus on what actually matters: building the platform our customers rely on."

3E Engineering Team

Team Focus

This was perhaps the most significant outcome. With the control plane managed by AWS, 3E's team could redirect their energy toward tasks that directly add value: improving the SaaS platform, building new features, and serving their customers. Kubernetes was no longer a source of operational anxiety. It was infrastructure that simply worked.

What This Means for Teams Running Self-Managed Kubernetes

3E's experience is not unique. Many organisations adopted Kubernetes early, chose to self-manage because it seemed like the right decision at the time, and have since found themselves spending more engineering effort on the platform than on the products running on it.

The lesson from this migration is straightforward: if your Kubernetes cluster is consuming more engineering time than it saves, it is time to reconsider the operating model. Managed Kubernetes services like EKS exist precisely to offload the undifferentiated heavy lifting of control plane management, so your team can focus on the work that actually differentiates your business.

The migration does not have to be risky. With an IaC-first approach, incremental workload migration, and the right engineering partner, it can be done safely, quickly, and without service disruption.

The question is not whether managed Kubernetes is better. It is whether your team's time is better spent managing a control plane or building the product.