The Challenge: AWS Complexity at Scale

Voith Drive applications are web-based courses built to upskill employees at every stage of digital transformation. As the platform grew, so did the complexity of the AWS infrastructure supporting it. What began as a handful of manually configured services had evolved into a sprawling environment that no single engineer could fully map.

The challenges were interconnected. AWS services were becoming harder to orchestrate as the platform expanded. Infrastructure was being provisioned through a mix of console clicks and ad-hoc scripts, creating configuration drift between environments. The CI/CD pipeline had been bolted together over time rather than designed, leading to inconsistent deployments and fragile release processes.

Monitoring was reactive rather than proactive. When something broke, the team would scramble to find the root cause across disconnected logging tools. Security and access management had been layered on incrementally, with no unified model for who could do what, and no audit trail that could survive scrutiny.

"We had grown the platform faster than our ability to manage it. Every new feature meant more AWS services, more manual steps, and more things that could break without anyone noticing until a user reported it."

Voith Drive Engineering Lead

Our Approach: Infrastructure as Code First

Out.Cloud was selected as the ideal partner due to proficiency in developing and maintaining infrastructure using AWS Services and DevOps tools. Rather than patching individual pain points, we recommended a foundational shift: treat infrastructure as an engineering discipline, not an operational afterthought.

Our strategy was built on three pillars. First, codify all infrastructure so that every environment could be reproduced from a single source of truth. Second, automate the entire path from code commit to production deployment. Third, instrument everything so that the team could observe system behaviour in real time rather than investigating incidents after the fact.

This was not about adding more tools to the stack. It was about replacing fragmented manual processes with a coherent, automated system that would scale with the platform rather than against it.

Infrastructure Automation with Terraform and Packer

The first phase focused on bringing the entire AWS infrastructure under Terraform management. Every VPC, subnet, security group, load balancer, and compute instance was defined in version-controlled configuration files. This eliminated configuration drift entirely: if the running infrastructure did not match the code, Terraform would detect and correct it.

For machine images, we introduced Packer to automate AMI creation. Instead of manually configuring EC2 instances and hoping the next deployment would match, Packer built standardised, tested images that could be deployed identically across development, staging, and production. The benefits were immediate:

  • Reproducible environments -- development, staging, and production ran identical configurations
  • Automated provisioning -- new infrastructure was deployed through terraform apply, not console clicks
  • Version-controlled state -- every infrastructure change was reviewable, auditable, and reversible
  • Faster recovery -- if an environment was corrupted, it could be rebuilt from code in minutes
  • Reduced human error -- the most common source of outages was eliminated
Terraform Infrastructure as Code
Datadog Full-stack observability
CI/CD Automated pipelines

CI/CD Pipeline: From Ad-Hoc to Automated

The existing deployment process at Voith Drive was a sequence of manual steps: pull the latest code, run tests locally, build artifacts on a developer's machine, then SSH into servers to deploy. Each step introduced risk. Each step depended on tribal knowledge that lived in one or two people's heads.

Out.Cloud designed and implemented a fully automated CI/CD pipeline that followed best practices for repository management and deployment orchestration. The pipeline handled:

  • Automated builds triggered on every commit to the main branch
  • Comprehensive test suites executed before any artifact was promoted
  • Packer-based image builds with baked-in application configurations
  • Blue-green deployments on AWS to eliminate downtime during releases
  • Automatic rollback triggers if health checks failed post-deployment

The result was a deployment process that was faster, safer, and completely independent of any individual engineer's availability. What previously took hours of careful manual work now completed in minutes, with every step logged and auditable.

Monitoring and SLO Contracts

The third pillar of the transformation was observability. Before Out.Cloud's engagement, monitoring at Voith Drive consisted of CloudWatch alarms that fired too often (and were therefore ignored) and a shared Slack channel where engineers posted error messages they happened to notice.

We implemented Datadog as the unified observability platform, covering infrastructure metrics, application performance monitoring, log aggregation, and synthetic checks. But the tooling was only half the story. The more important change was the introduction of Service Level Objectives (SLOs).

For each critical service, we defined measurable SLOs -- latency, availability, error rate -- and configured Datadog to track error budgets in real time. When a service consumed too much of its error budget, alerts fired to Slack with enough context for an engineer to diagnose the issue without logging into six different dashboards.

The automated infrastructure pipeline: Terraform provisions, Packer builds images, CI/CD deploys, and Datadog monitors the entire loop with SLO contracts.

Security and Access Management

With infrastructure now defined as code, security policies could be embedded directly into the provisioning process. We implemented least-privilege IAM policies across all AWS accounts, ensuring that every service and every engineer had exactly the permissions they needed and nothing more.

Access management was centralised through AWS IAM roles and policies managed via Terraform. This meant that permission changes were version-controlled, peer-reviewed, and auditable. No more ad-hoc permission grants through the AWS console. No more shared credentials. No more uncertainty about who had access to what.

Integration compatibility -- one of Voith Drive's original concerns -- was addressed by standardising how services communicated. API contracts were documented, network policies were codified, and inter-service dependencies were made explicit in the Terraform modules. When a new service was added, the infrastructure code defined exactly how it integrated with everything else.

Outcomes: A Platform Built to Scale

The transformation delivered measurable improvements across every dimension that Voith Drive had identified as problematic:

  • Infrastructure management: Automated deployment on AWS Console, saving time and cost. Environments that previously took days to provision could now be created in minutes from Terraform code.
  • CI/CD pipeline: Automation of repository management and deployments, following best practices. Releases went from multi-hour manual processes to automated pipelines completing in minutes.
  • Monitoring and performance: Effective resource monitoring through Datadog and Slack, ensuring adherence to SLO contracts. The team moved from reactive firefighting to proactive capacity management.
  • Security posture: Unified access management with full audit trail. Every permission change was tracked, reviewed, and reversible.
  • Operational confidence: The team could deploy, monitor, and recover with predictable processes rather than heroic individual effort.

"Out.Cloud did not just fix our infrastructure -- they taught us how to think about infrastructure as a product. The combination of Terraform, Packer, and Datadog gave us a system that is not only more reliable but fundamentally more understandable."

Voith Drive Technical Director

What This Means for Your Organisation

The Voith Drive engagement illustrates a pattern we see repeatedly: organisations that have grown their cloud footprint faster than their ability to manage it. The symptoms are always similar -- manual provisioning, inconsistent environments, fragile deployments, reactive monitoring, and a growing sense that the infrastructure is becoming a liability rather than an enabler.

The solution is not more people doing the same manual work faster. It is a shift in how infrastructure is treated: as code that can be versioned, tested, and deployed with the same rigour applied to application code. When you combine infrastructure as code (Terraform), image automation (Packer), automated pipelines (CI/CD), and real-time observability (Datadog), you get a platform that becomes easier to manage as it grows rather than harder.

Voith Drive's digital learning platform is now built on foundations that can scale with the business. That is the real outcome of this engagement: not a one-time cleanup, but a permanent upgrade in how the platform is built, deployed, and operated.