Jobs / ARX Robotics GmbH

Staff Site Reliability Engineer (m/f/d)

ARX Robotics GmbH · München, BY, Deutschland
München, BY, DeutschlandHybrid
Remuneration
Not specified
Location
München, BY, Deutschland
Visa sponsorship
Not specified

Job summary

ARX Robotics is seeking a Staff Site Reliability Engineer to transform Cloud and IT services into highly reliable, observable, and automated products. This role involves owning critical infrastructure such as Vault/PKI, CI/CD systems, and monitoring platforms, ensuring they are robust, resilient, and continuously improving.

Qualifications

  • Demonstrated passion for reliability and automation, evidenced by personal projects or workflow automation.
  • Proven experience in a Site Reliability, DevOps, or Platform Engineering role with responsibility for production systems.
  • Hands-on experience operating and improving shared services such as CI/CD, secrets management, or monitoring platforms.
  • Automation-first mindset with scripting skills in Python, Go, or shell.
  • Strong understanding of observability principles and experience building monitoring for production services.
  • Ability to write clear and concise documentation, especially for runbooks and incident procedures.
  • Proactive, collaborative approach to problem-solving and commitment to operational excellence.

Responsibilities

  • Transform central Cloud and IT services into highly reliable, observable, and automated products.
  • Take ownership of critical infrastructure including Vault/PKI, CI/CD systems, and monitoring platforms.
  • Ensure systems are robust, resilient, and continuously improving.
  • Establish clear service ownership, SLOs, and incident response workflows for shared platform services.
  • Develop a comprehensive observability practice with meaningful metrics, logs, alerts, and operational dashboards.
  • Implement resilient and automated patterns for deployment, monitoring, backup, and recovery.
  • Create pragmatic automations to eliminate recurring operational work and unblock engineering teams.
  • Maintain highly available and secure shared services like Vault/PKI, build infrastructure, and CI/CD support systems.
  • Develop actionable runbooks and operational documentation for confident incident response.
  • Form strong partnerships with engineering teams to establish clear ownership boundaries and improve service handoffs.
  • Collaborate closely with Backend Engineering to ensure new internal applications are operable from day one.
  • Foster a culture of reliability by participating in incident response, recovery drills, and blameless post-mortems.

Skills

BashGoPythonVaultLinux

Relocation

No