Jobs / Quantum-Systems
Senior Platform Engineer – Cloud & ML Platform (m/f/d)
Quantum-Systems · Gilching, BY, Deutschland
Gilching, BY, DeutschlandHybrid
Remuneration
Not specified
Location
Gilching, BY, Deutschland
Visa sponsorship
Not specified
Job summary
As a Platform Engineer for Cloud & ML Platform, you will design, deploy, and improve Kubernetes-based platforms for machine learning workloads, collaborating with various teams to provide robust infrastructure.
Benefits
Company pension schemeFlexible working hoursMobile WorkEGYM Wellpass accessBike-LeasingCorporate BenefitsEmployee eventsLunch-CardCompany Shuttle
Qualifications
- Expertise with Kubernetes in production environments.
- Experience deploying and maintaining large-scale clusters.
- Experience with Kubeflow and Metaflow in ML environments.
- Understanding of MLOps workflows and deployment automation.
- Experience with GPU-enabled Kubernetes environments.
- Infrastructure-as-code experience with tools like Terraform and Ansible.
- Understanding of cloud-native observability.
- Experience with containerization and CI/CD processes.
- Familiarity with cloud platforms like Azure and AWS.
- Scripting skills in Python, Go, or Bash.
- Ability to analyze infrastructure issues and implement solutions.
- Structured mindset with a strong sense of ownership.
- Strong communication skills for collaboration with teams.
- Proficient in English.
Responsibilities
- Design, deploy, operate, and improve Kubernetes-based platforms for machine learning workloads.
- Build and maintain globally distributed Kubernetes clusters focusing on reliability and security.
- Manage lifecycle of ML platform components including Kubeflow and Metaflow.
- Enable AI teams to run scalable training and data processing pipelines.
- Develop infrastructure-as-code and automation workflows.
- Manage GPU workloads and resource utilization.
- Improve platform resilience through monitoring and incident response.
- Collaborate with teams to define platform standards and best practices.
- Support hybrid and multi-cloud infrastructure scenarios.
- Evaluate and integrate cloud providers and technologies.
- Enhance developer experience for ML engineers.
- Facilitate transition of AI capabilities from prototype to production.
Skills
AnsibleArgo CDAWSAzureBashFluxGCPGoHelmKubernetesKustomizeOpenStackPythonTerraform
Relocation
No