Jobs / SimScale GmbH

Senior SRE / Platform Engineer (m/f/d)

SimScale GmbH · Home Office, Deutschland
Home Office, DeutschlandExp: 5+ yrsRemote
Remuneration
Not specified
Location
Home Office, Deutschland
Visa sponsorship
Not specified

Job summary

SimScale is seeking a Senior SRE / Platform Engineer to own and improve its cloud infrastructure, focusing on AWS, EKS, observability, disaster recovery, security, and multi-region architecture. This individual contributor role involves building standards, guardrails, and self-service tooling for engineering teams, with a path toward tech-lead ownership.

Benefits

Unlimited growth opportunitiesLeadership potentialFlexible hoursRemote work flexibilityComprehensive health coverageRetirement plansPaid time offWellness supportOffice lunchesGift cards for remote employeesOnline/offline learningLanguage coursesTech talksTeam eventsSupport groupsESG initiativesDE&I initiativesTeam challenges and competitions

Qualifications

  • 5+ years of professional experience in SRE, platform, or infrastructure engineering
  • Background in software development with a transition to SRE
  • Ability to write production-quality software in Python, Go, Rust, or Java
  • Strong understanding of Linux internals and distributed systems for debugging complex production behavior
  • Hands-on experience with AWS or GCP
  • Experience with declarative infrastructure using Terraform
  • Experience with GitOps workflow using ArgoCD
  • Experience with container orchestration using Kubernetes
  • Experience with OpenTelemetry, Prometheus, distributed tracing, monitoring, and meaningful SLOs/SLIs
  • Ability to investigate complex failures and communicate clearly during incidents
  • Ability to translate findings into durable improvements
  • Understanding of how infrastructure decisions affect access control, auditability, disaster recovery, logging, and SOC 2 standards
  • Ability to explain trade-offs to engineering teams
  • Ability to help others adopt better platform practices

Responsibilities

  • Own and improve cloud infrastructure for SimScale's browser-based simulation platform
  • Build standards, guardrails, and self-service tooling for engineering teams
  • Raise reliability and security without slowing engineering velocity
  • Evolve the Kubernetes platform
  • Evaluate and adopt technologies like Kubernetes Gateway API and service mesh patterns
  • Coordinate platform evolution across engineering teams
  • Drive organization-wide adoption of OpenTelemetry for distributed tracing and metrics
  • Help teams define meaningful SLOs
  • Shape multi-region architecture and data residency
  • Support transition to a global, multi-cloud architecture meeting disaster recovery and data residency requirements
  • Manage cloud cost and efficiency for petabyte-scale infrastructure
  • Ensure infrastructure is cost-efficient, secure, and well-instrumented
  • Improve tooling by building self-service AWS account provisioning
  • Develop guardrails and AI-assisted automations for safe and efficient infrastructure management

Skills

Argo CDAWSEKSGCPGoJavaKubernetesLinuxMakeOpenTelemetryPrometheusPythonRustTerraform

Languages

PythonGoRustJava

Work schedule

Flexible hours

Relocation

No