Jobs / Waymo

Staff Machine Learning Infrastructure Engineer, Simulation

Waymo · London, ENG, United Kingdom
London, ENG, United KingdomFull timeExp: 5+ yrs155,000-163,000 GBP/yearlyRemote
Remuneration
155,000-163,000 GBP/yearly
Location
London, ENG, United Kingdom
Visa sponsorship
Not specified

Job summary

Waymo is seeking an experienced Senior Machine Learning Infrastructure Engineer to lead the development of advanced AI/ML infrastructure for multi-billion parameter foundation models in ML accelerator-friendly simulations. This role involves building scalable AI/ML infrastructure for state-of-the-art simulations, focusing on large foundation models and ML accelerators. The successful candidate will provide technical leadership, design distributed systems, and mentor junior engineers.

Qualifications

  • Bachelor's degree in Computer Science, Robotics, or a similar technical field, or equivalent practical experience.
  • Five or more years of professional software engineering experience.
  • At least three years of experience in machine learning infrastructure, including developing, scaling, training, deploying, and optimizing large-scale machine learning systems.
  • Master's degree in Computer Science, Robotics, or a similar technical field, or equivalent practical experience is preferred.
  • Ten or more years of professional software engineering experience is preferred.
  • At least five years of experience in machine learning infrastructure, including developing, designing, scaling, training, deploying, and optimizing large-scale machine learning systems is preferred.
  • Solid experience in the development and optimization of machine learning infrastructure tools such as DeepSpeed, PyTorch, or TensorFlow is preferred.
  • Strong expertise in distributed training techniques, including gradient sharding and optimization strategies for scaling large models across ML accelerator profiling tools is preferred.
  • Ability to uncover performance bottlenecks is preferred.
  • Deep understanding of state-of-the-art machine learning models such as auto-regressive transformers is preferred.
  • Familiarity with custom-kernels for diverse hardware compute based efficiency is preferred.
  • Practical familiarity with Autonomous Driving, Simulations, and ML accelerators is a plus.

Responsibilities

  • Advance state-of-the-art ultra-realistic multi-agent simulations using foundation models.
  • Collaborate with Google DeepMind and Waymo Realism Modeling teams to improve simulation realism.
  • Provide technical leadership on large-scale ML model architectures for autonomous vehicles.
  • Work at the intersection of data engineering, model development, and deployment.
  • Provide guidance on architectural decisions and technical directions.
  • Own large, complex systems and drive architectures to meet technical and business objectives.
  • Design and scale large distributed systems covering the ML lifecycle.
  • Support planet-scale dataset generation and model training.
  • Collaborate cross-functionally to derive performance and system-level requirements for large ML systems.
  • Translate product and business goals into measurable technical deliverables.
  • Ensure system component alignment.
  • Mentor junior engineers and foster a collaborative culture.

Skills

Python

Degrees

BS in Computer ScienceBS in RoboticsMS in Computer ScienceMS in Robotics

Relocation

No