Jobs / Schwarz Corporate Solutions
Site Reliability Engineer / SRE - Cloud Storage - STACKIT (m/f/d)
Schwarz Corporate Solutions · Bad Friedrichshall, BW, Deutschland
Bad Friedrichshall, BW, DeutschlandOnsite
Remuneration
Not specified
Location
Bad Friedrichshall, BW, Deutschland
Visa sponsorship
Not specified
Job summary
Schwarz Digits, the IT and digital division of the Schwarz Group, is seeking an experienced professional to join their team. This role involves maintaining and optimizing the stability and availability of highly available, resilient storage infrastructure, automating provisioning and operating processes, and contributing to a robust and efficient storage architecture. The role also includes performance and capacity planning, as well as incident and post-mortem analysis.
Qualifications
- Desire to shape solutions with state-of-the-art cloud technologies.
- Extensive experience with various storage products (e.g., NetApp, Cohesity, Pure, Ceph) in block, object, backup, or file storage.
- Good knowledge of cloud environments and their architectures.
- Expertise in operating storage infrastructure (e.g., solution scenarios, provision, scaling, migration, incident response).
- Proficiency in storage infrastructure automation (e.g., Golang, Python, Bash, Ansible).
- Familiarity with containerized system landscapes of the storage environment (e.g., k8s).
- Experience in monitoring, alerting, and logging for complete system monitoring (e.g., Prometheus, Grafana, Elasticsearch).
- Experience working with and developing APIs (e.g., REST API with Golang and Python).
- Enjoy challenges of operating storage systems (e.g., protocols, troubleshooting, performance analysis, high availability, lifecycle).
- Passion and enthusiasm for new technologies and storage systems.
- Desire to be part of a motivated team that strives for continuous improvement.
- Excellent communication skills in German and English for international, agile teams.
Responsibilities
- Maintain and optimize stability and availability of highly available, resilient storage infrastructure (block, object, backup, and file storage).
- Ensure stability through proactive monitoring, fault resolution, and prevention.
- Automate provisioning and operating processes in the storage environment.
- Contribute to a robust and efficient storage architecture.
- Take end-to-end responsibility for products provided to customers.
- Analyze and optimize performance of existing systems for future scaling.
- Conduct forward-looking capacity planning.
- Process major incidents with storage participation as part of incident and problem management.
- Derive and implement mitigating measures for future incidents.
Skills
AnsibleBashCephElasticsearchGoGrafanaKubernetesPrometheusPythonREST
Languages
GermanEnglish
Relocation
No