Jobs / Quantum-Systems

Senior Platform Engineer – Cloud & ML Platform (m/f/d)

Quantum-Systems · Gilching, BY, Deutschland
Gilching, BY, DeutschlandHybrid
Remuneration
Not specified
Location
Gilching, BY, Deutschland
Visa sponsorship
Not specified

Job summary

As a Platform Engineer for Cloud & ML Platform, you will design, deploy, and improve Kubernetes-based platforms for machine learning workloads, collaborating with various teams to provide robust infrastructure.

Benefits

Company pension schemeFlexible working hoursMobile WorkEGYM Wellpass accessBike-LeasingCorporate BenefitsEmployee eventsLunch-CardCompany Shuttle

Qualifications

  • Expertise with Kubernetes in production environments.
  • Experience deploying and maintaining large-scale clusters.
  • Experience with Kubeflow and Metaflow in ML environments.
  • Understanding of MLOps workflows and deployment automation.
  • Experience with GPU-enabled Kubernetes environments.
  • Infrastructure-as-code experience with tools like Terraform and Ansible.
  • Understanding of cloud-native observability.
  • Experience with containerization and CI/CD processes.
  • Familiarity with cloud platforms like Azure and AWS.
  • Scripting skills in Python, Go, or Bash.
  • Ability to analyze infrastructure issues and implement solutions.
  • Structured mindset with a strong sense of ownership.
  • Strong communication skills for collaboration with teams.
  • Proficient in English.

Responsibilities

  • Design, deploy, operate, and improve Kubernetes-based platforms for machine learning workloads.
  • Build and maintain globally distributed Kubernetes clusters focusing on reliability and security.
  • Manage lifecycle of ML platform components including Kubeflow and Metaflow.
  • Enable AI teams to run scalable training and data processing pipelines.
  • Develop infrastructure-as-code and automation workflows.
  • Manage GPU workloads and resource utilization.
  • Improve platform resilience through monitoring and incident response.
  • Collaborate with teams to define platform standards and best practices.
  • Support hybrid and multi-cloud infrastructure scenarios.
  • Evaluate and integrate cloud providers and technologies.
  • Enhance developer experience for ML engineers.
  • Facilitate transition of AI capabilities from prototype to production.

Skills

AnsibleArgo CDAWSAzureBashFluxGCPGoHelmKubernetesKustomizeOpenStackPythonTerraform

Relocation

No