The CI/CD Challenge at Scale
SigTech is a leading provider of quant technologies, operating an award-winning platform that enables institutional investors to research, build, and deploy custom quantitative strategies using financial data. As the company scaled rapidly, their engineering team hit a wall that many fast-growing technology companies face: their CI/CD system could not keep up.
The pipelines powering SigTech's development workflow relied on GitHub-hosted runners. These shared runners were adequate for simple workloads, but for a quantitative technology company running complex data pipelines, model training, and multi-stage integration tests, they introduced unpredictable constraints. Build times were inconsistent, resource limits were opaque, and there was no way to tailor runner specifications to the diverse demands of quant pipeline workloads.
Costs were also climbing. As the team grew and release frequency increased, the number of runner minutes consumed per month was rising faster than the engineering headcount. Worse, the team had no visibility into where those minutes were being spent, or how to optimise them.
"We needed CI/CD to be a platform capability, not a black box. Our quant engineers were waiting on builds that should have taken minutes, not hours. And we couldn't see what was happening inside the pipeline."
SigTech Engineering Leadership
Our Approach: Self-Hosted Runners as a Platform
When SigTech engaged Out.Cloud, the brief was straightforward: build a CI/CD system that gives us complete control over our pipeline infrastructure, without adding operational complexity for developers.
The standard approach would have been to spin up a handful of EC2 instances and register them as GitHub Actions runners. This works for small teams, but it doesn't scale. Runners need to be provisioned, secured, monitored, and de-provisioned. When demand spikes during a release cycle, the cluster needs to grow. When it's quiet, it needs to shrink to control costs.
We recommended a fundamentally different model: treat the runner fleet as a Kubernetes workload on Amazon EKS. This meant runners would be pods, not instances. They would be ephemeral, purpose-built, and managed through the same infrastructure-as-code practices that govern the rest of SigTech's platform.
Architecture Design
The architecture we designed had three layers:
- Amazon EKS cluster -- a dedicated, isolated Kubernetes cluster running in SigTech's AWS account, separate from application workloads
- Custom runner controller -- an operator that listens for GitHub Actions workflow events and provisions the appropriate runner pod, with the right CPU, memory, GPU access, and IAM permissions for each job type
- Observability layer -- all CI/CD interactions export metrics to SigTech's monitoring stack, providing complete visibility into pipeline performance, resource consumption, and bottlenecks
We defined four distinct runner types, each optimised for a specific workload pattern. Build runners with moderate compute for compilation and packaging. Test runners with high memory and fast storage for integration test suites. Deploy runners with minimal compute but tightly scoped IAM roles for production deployments. And quant pipeline runners with GPU access and large memory allocations for data-intensive workloads.
Security and Isolation
For a company operating in the financial data space, security was non-negotiable. Every runner pod runs in an isolated namespace with network policies that restrict egress to only the services required for that specific job type. Secrets are injected at runtime through AWS Secrets Manager, never baked into images or stored in environment variables.
The EKS cluster itself is fully private. There are no public endpoints. All communication between GitHub and the runner controller happens through a secure webhook relay, and all runner registration tokens are short-lived and rotated automatically.
We also implemented pod-level security contexts that enforce read-only root filesystems, drop all Linux capabilities except those explicitly required, and run all workloads as non-root users. For the quant pipeline runners that need GPU access, we configured device plugins with the minimum required permissions.
# Example: runner pod security context
securityContext:
runAsNonRoot: true
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
seccompProfile:
type: RuntimeDefault
Observability: Seeing Inside the Pipeline
One of SigTech's key requirements was visibility. With GitHub-hosted runners, they could see whether a job passed or failed, but nothing about resource consumption, queue wait times, or infrastructure costs per pipeline.
We built an observability layer that exports metrics from every CI/CD interaction. Every runner pod reports CPU and memory utilisation, job duration, queue wait time, and whether it was a cache hit or miss. These metrics flow into SigTech's existing monitoring stack, where engineering leads can see exactly how their pipelines are performing.
This visibility enabled data-driven decisions. Within the first month, the team identified three workflow files that were consuming 40% of total runner minutes due to inefficient caching. After optimising those workflows, they recovered that capacity without adding any infrastructure.
End-to-end flow: GitHub event triggers the controller, which provisions the right runner pod on EKS, executes the job, and exports metrics.
Developer Experience: Self-Service Pipelines
A self-hosted runner cluster is only valuable if developers can use it without filing infrastructure tickets. We designed the system so that creating a new pipeline is a self-service operation. Engineers write their GitHub Actions workflow files as they always have, but instead of specifying runs-on: ubuntu-latest, they specify a custom label like runs-on: sigtech-build-gpu.
The runner controller handles everything else. It provisions the right pod, attaches the right IAM role, mounts the right cache volumes, and tears the pod down when the job completes. From the developer's perspective, the experience is identical to using GitHub-hosted runners, except faster, more predictable, and with resources tailored to their workload.
Updates to the runner infrastructure are managed entirely through infrastructure-as-code. When the team needs a new runner type, or when a security patch needs to be applied to runner images, the change goes through the same pull request and review process as any other infrastructure change. No SSH access, no ad-hoc configuration, no drift.
Outcomes and Benefits
The self-hosted GitHub Actions cluster on Amazon EKS delivered measurable improvements across every dimension SigTech cared about:
- Improved CI/CD security: The dedicated EKS cluster provided a fully isolated environment with network policies, short-lived credentials, and pod-level security contexts. No shared infrastructure with external workloads.
- Resource control and visibility: Custom runner types deployed for all quant pipeline scenarios. All CI/CD interactions exported metrics, giving the team complete visibility into resource consumption and costs per pipeline.
- Improved maintainability: The self-hosted cluster facilitated a self-service methodology for creating new pipelines. All updates managed through infrastructure-as-code, eliminating manual configuration and drift.
- Better developer experience: Engineers use familiar GitHub Actions syntax with custom labels. No new tools to learn, no tickets to file. Faster builds with purpose-built runner specifications.
- Cost optimisation: Auto-scaling with Karpenter ensures the cluster grows and shrinks with demand. Right-sizing runner pods to workload requirements eliminated wasted compute.
"Out.Cloud's mastery in automation and scaling was evident from the start. They didn't just set up runners, they built a platform that scales with us. The visibility alone transformed how we think about our CI/CD costs."
SigTech Engineering
What This Means for Your Organisation
SigTech's story illustrates a pattern we see repeatedly: as engineering teams scale, CI/CD stops being a tool and becomes infrastructure. When that happens, the managed-runner model starts to show its limits. Build times become unpredictable, costs become opaque, and the team loses the ability to tailor the environment to their specific workload requirements.
Self-hosted runners on EKS solve these problems by turning the runner fleet into a Kubernetes-native workload. You get all the benefits of container orchestration -- auto-scaling, resource isolation, infrastructure-as-code, observability -- applied to your CI/CD layer.
The key insight is that this is not just an infrastructure decision. It's a developer experience decision. When pipelines are fast, predictable, and visible, engineers ship faster and with more confidence. That's the real outcome SigTech achieved, and it's the outcome we help every client work toward.