Jobs / ARX Robotics GmbH

Staff Site Reliability Engineer (m/f/d)

Apply Now

ARX Robotics GmbH · München, BY, Deutschland

München, BY, DeutschlandHybrid

Apply Now

Remuneration

Not specified

Location

München, BY, Deutschland

Visa sponsorship

Not specified

Job summary

ARX Robotics is seeking a Staff Site Reliability Engineer to transform Cloud and IT services into highly reliable, observable, and automated products. This role involves owning critical infrastructure such as Vault/PKI, CI/CD systems, and monitoring platforms, ensuring they are robust, resilient, and continuously improving.

Qualifications

Demonstrated passion for reliability and automation, evidenced by personal projects or workflow automation.
Proven experience in a Site Reliability, DevOps, or Platform Engineering role with responsibility for production systems.
Hands-on experience operating and improving shared services such as CI/CD, secrets management, or monitoring platforms.
Automation-first mindset with scripting skills in Python, Go, or shell.
Strong understanding of observability principles and experience building monitoring for production services.
Ability to write clear and concise documentation, especially for runbooks and incident procedures.
Proactive, collaborative approach to problem-solving and commitment to operational excellence.

Responsibilities

Transform central Cloud and IT services into highly reliable, observable, and automated products.
Take ownership of critical infrastructure including Vault/PKI, CI/CD systems, and monitoring platforms.
Ensure systems are robust, resilient, and continuously improving.
Establish clear service ownership, SLOs, and incident response workflows for shared platform services.
Develop a comprehensive observability practice with meaningful metrics, logs, alerts, and operational dashboards.
Implement resilient and automated patterns for deployment, monitoring, backup, and recovery.
Create pragmatic automations to eliminate recurring operational work and unblock engineering teams.
Maintain highly available and secure shared services like Vault/PKI, build infrastructure, and CI/CD support systems.
Develop actionable runbooks and operational documentation for confident incident response.
Form strong partnerships with engineering teams to establish clear ownership boundaries and improve service handoffs.
Collaborate closely with Backend Engineering to ensure new internal applications are operable from day one.
Foster a culture of reliability by participating in incident response, recovery drills, and blameless post-mortems.

Skills

BashGoPythonVaultLinux

Relocation

Apply Now