Jobs / ARX Robotics GmbH
Staff Site Reliability Engineer (m/f/d)
ARX Robotics GmbH · München, BY, Deutschland
München, BY, DeutschlandHybrid
Remuneration
Not specified
Location
München, BY, Deutschland
Visa sponsorship
Not specified
Job summary
ARX Robotics is seeking a Staff Site Reliability Engineer to transform Cloud and IT services into highly reliable, observable, and automated products. This role involves owning critical infrastructure such as Vault/PKI, CI/CD systems, and monitoring platforms, ensuring they are robust, resilient, and continuously improving.
Qualifications
- Demonstrated passion for reliability and automation, evidenced by personal projects or workflow automation.
- Proven experience in a Site Reliability, DevOps, or Platform Engineering role with responsibility for production systems.
- Hands-on experience operating and improving shared services such as CI/CD, secrets management, or monitoring platforms.
- Automation-first mindset with scripting skills in Python, Go, or shell.
- Strong understanding of observability principles and experience building monitoring for production services.
- Ability to write clear and concise documentation, especially for runbooks and incident procedures.
- Proactive, collaborative approach to problem-solving and commitment to operational excellence.
Responsibilities
- Transform central Cloud and IT services into highly reliable, observable, and automated products.
- Take ownership of critical infrastructure including Vault/PKI, CI/CD systems, and monitoring platforms.
- Ensure systems are robust, resilient, and continuously improving.
- Establish clear service ownership, SLOs, and incident response workflows for shared platform services.
- Develop a comprehensive observability practice with meaningful metrics, logs, alerts, and operational dashboards.
- Implement resilient and automated patterns for deployment, monitoring, backup, and recovery.
- Create pragmatic automations to eliminate recurring operational work and unblock engineering teams.
- Maintain highly available and secure shared services like Vault/PKI, build infrastructure, and CI/CD support systems.
- Develop actionable runbooks and operational documentation for confident incident response.
- Form strong partnerships with engineering teams to establish clear ownership boundaries and improve service handoffs.
- Collaborate closely with Backend Engineering to ensure new internal applications are operable from day one.
- Foster a culture of reliability by participating in incident response, recovery drills, and blameless post-mortems.
Skills
BashGoPythonVaultLinux
Relocation
No