Jobs / Impower GmbH
Senior Site Reliability Engineer (m/w/d)
Impower GmbH · München, BY, Deutschland
München, BY, DeutschlandExp: 5+ yrsHybrid
Remuneration
Not specified
Location
München, BY, Deutschland
Visa sponsorship
Not specified
Job summary
Impower is seeking a Senior Site Reliability Engineer to own the reliability and operational foundations of their AI-driven ERP platform. This role involves working with Kubernetes, AWS, CI/CD, observability, and security to ensure scalable, resilient, and secure systems. The ideal candidate will have 5+ years of experience building and operating production systems in cloud environments.
Benefits
Hybrid setupFlexible hoursOwnershipMeaningful impactModern tech stackGrowth opportunitiesSupportive cultureAutonomyTrustCollaborationOnboarding guidance
Qualifications
- 5+ years building and operating production systems in cloud environments
- Real ownership of non-trivial systems at scale
- Deep, hands-on production Kubernetes experience, including operators, networking, autoscaling, and debugging
- Strong working knowledge of EKS, RDS, ALB, IAM, VPC, S3, and operational realities of running services on AWS
- Solid Terraform experience with disciplined IaC practices
- Hands-on experience with ArgoCD, Helm, or equivalent declarative deployment tooling
- Security expertise in cloud-native environments: IAM best practices, secrets management, secure network architecture, container and dependency vulnerability scanning, secure SDLC principles
- Familiarity with compliance frameworks (e.g., ISO 27001, SOC 2)
- Proactively identify risks and contribute to incident response and audit readiness
- Experience building dashboards, defining SLOs, running incidents, and using learning to improve systems
- Comfortable scripting and building tooling in Python, Go, Bash, or similar
- Excellent written and verbal English (C1+)
- Ability to document decisions, write effective runbooks, and clearly explain tradeoffs
Responsibilities
- Own platform reliability end-to-end
- Co-own Kubernetes-based platform on AWS, including ingress, autoscaling, service mesh, config, and secrets
- Ensure platform scales with growth
- Drive CI/CD excellence
- Evolve GitLab, Terraform, ArgoCD/Helm pipelines for faster, safer delivery of Java/Spring Boot and React applications
- Provide self-service capabilities for product teams
- Manage cloud infrastructure
- Design and operate scalable AWS infrastructure (EKS, RDS, ALB, IAM, VPC, S3) using Infrastructure as Code
- Maintain strong IaC discipline and clear change management
- Strengthen observability
- Improve Sentry, Grafana, Prometheus, and Loki setup for SLO definition, fast debugging, and confident service operation
- Lead on security
- Own security posture across infrastructure and application layers (IAM, secrets management, network segmentation, container and dependency scanning, vulnerability management, supply chain security, audit readiness)
- Embed security as a design constraint
- Improve incident response
- Strengthen on-call practices, runbooks, and post-incident learning
- Enable product teams
- Provide tooling, guidance, and self-service capabilities for better operational and deployment practices
- Support broader platform surface, including Temporal workflows, PostgreSQL operations, S3, Estuary CDC pipeline, and AI service infrastructure on GCP/Azure
Skills
Argo CDAWSAzureBashEKSGCPGitLabGoGrafanaHelmIAMJavaKubernetesLokiPostgreSQLPrometheusPythonS3SentryTerraformTypeScriptGitLab CI
Languages
English
Industry
Property managementSaaS
Relocation
No