Jobs / Waymo
Staff Machine Learning Infrastructure Engineer, Simulation
Waymo · London, ENG, United Kingdom
London, ENG, United KingdomFull timeExp: 5+ yrs155,000-163,000 GBP/yearlyRemote
Remuneration
155,000-163,000 GBP/yearly
Location
London, ENG, United Kingdom
Visa sponsorship
Not specified
Job summary
Waymo is seeking an experienced Senior Machine Learning Infrastructure Engineer to lead the development of advanced AI/ML infrastructure for multi-billion parameter foundation models in ML accelerator-friendly simulations. This role involves building scalable AI/ML infrastructure for state-of-the-art simulations, focusing on large foundation models and ML accelerators. The successful candidate will provide technical leadership, design distributed systems, and mentor junior engineers.
Qualifications
- Bachelor's degree in Computer Science, Robotics, or a similar technical field, or equivalent practical experience.
- Five or more years of professional software engineering experience.
- At least three years of experience in machine learning infrastructure, including developing, scaling, training, deploying, and optimizing large-scale machine learning systems.
- Master's degree in Computer Science, Robotics, or a similar technical field, or equivalent practical experience is preferred.
- Ten or more years of professional software engineering experience is preferred.
- At least five years of experience in machine learning infrastructure, including developing, designing, scaling, training, deploying, and optimizing large-scale machine learning systems is preferred.
- Solid experience in the development and optimization of machine learning infrastructure tools such as DeepSpeed, PyTorch, or TensorFlow is preferred.
- Strong expertise in distributed training techniques, including gradient sharding and optimization strategies for scaling large models across ML accelerator profiling tools is preferred.
- Ability to uncover performance bottlenecks is preferred.
- Deep understanding of state-of-the-art machine learning models such as auto-regressive transformers is preferred.
- Familiarity with custom-kernels for diverse hardware compute based efficiency is preferred.
- Practical familiarity with Autonomous Driving, Simulations, and ML accelerators is a plus.
Responsibilities
- Advance state-of-the-art ultra-realistic multi-agent simulations using foundation models.
- Collaborate with Google DeepMind and Waymo Realism Modeling teams to improve simulation realism.
- Provide technical leadership on large-scale ML model architectures for autonomous vehicles.
- Work at the intersection of data engineering, model development, and deployment.
- Provide guidance on architectural decisions and technical directions.
- Own large, complex systems and drive architectures to meet technical and business objectives.
- Design and scale large distributed systems covering the ML lifecycle.
- Support planet-scale dataset generation and model training.
- Collaborate cross-functionally to derive performance and system-level requirements for large ML systems.
- Translate product and business goals into measurable technical deliverables.
- Ensure system component alignment.
- Mentor junior engineers and foster a collaborative culture.
Skills
Python
Degrees
BS in Computer ScienceBS in RoboticsMS in Computer ScienceMS in Robotics
Relocation
No