Jobs / SimScale GmbH
Senior SRE / Platform Engineer (m/f/d)
SimScale GmbH · Home Office, Deutschland
Home Office, DeutschlandExp: 5+ yrsRemote
Remuneration
Not specified
Location
Home Office, Deutschland
Visa sponsorship
Not specified
Job summary
SimScale is seeking a Senior SRE / Platform Engineer to own and improve its cloud infrastructure, focusing on AWS, EKS, observability, disaster recovery, security, and multi-region architecture. This individual contributor role involves building standards, guardrails, and self-service tooling for engineering teams, with a path toward tech-lead ownership.
Benefits
Unlimited growth opportunitiesLeadership potentialFlexible hoursRemote work flexibilityComprehensive health coverageRetirement plansPaid time offWellness supportOffice lunchesGift cards for remote employeesOnline/offline learningLanguage coursesTech talksTeam eventsSupport groupsESG initiativesDE&I initiativesTeam challenges and competitions
Qualifications
- 5+ years of professional experience in SRE, platform, or infrastructure engineering
- Background in software development with a transition to SRE
- Ability to write production-quality software in Python, Go, Rust, or Java
- Strong understanding of Linux internals and distributed systems for debugging complex production behavior
- Hands-on experience with AWS or GCP
- Experience with declarative infrastructure using Terraform
- Experience with GitOps workflow using ArgoCD
- Experience with container orchestration using Kubernetes
- Experience with OpenTelemetry, Prometheus, distributed tracing, monitoring, and meaningful SLOs/SLIs
- Ability to investigate complex failures and communicate clearly during incidents
- Ability to translate findings into durable improvements
- Understanding of how infrastructure decisions affect access control, auditability, disaster recovery, logging, and SOC 2 standards
- Ability to explain trade-offs to engineering teams
- Ability to help others adopt better platform practices
Responsibilities
- Own and improve cloud infrastructure for SimScale's browser-based simulation platform
- Build standards, guardrails, and self-service tooling for engineering teams
- Raise reliability and security without slowing engineering velocity
- Evolve the Kubernetes platform
- Evaluate and adopt technologies like Kubernetes Gateway API and service mesh patterns
- Coordinate platform evolution across engineering teams
- Drive organization-wide adoption of OpenTelemetry for distributed tracing and metrics
- Help teams define meaningful SLOs
- Shape multi-region architecture and data residency
- Support transition to a global, multi-cloud architecture meeting disaster recovery and data residency requirements
- Manage cloud cost and efficiency for petabyte-scale infrastructure
- Ensure infrastructure is cost-efficient, secure, and well-instrumented
- Improve tooling by building self-service AWS account provisioning
- Develop guardrails and AI-assisted automations for safe and efficient infrastructure management
Skills
Argo CDAWSEKSGCPGoJavaKubernetesLinuxMakeOpenTelemetryPrometheusPythonRustTerraform
Languages
PythonGoRustJava
Work schedule
Flexible hours
Relocation
No