Jobs / LUMAI

MLOps Engineer

Apply Now

LUMAI · Oxford, ENG, United Kingdom

Oxford, ENG, United KingdomExp: 5+ yrsRemote

MLOps Engineer

Apply Now

Remuneration

Not specified

Location

Oxford, ENG, United Kingdom

Visa sponsorship

Not specified

Job summary

Lumai is seeking an MLOps Engineer to design, build, and operate the infrastructure for taking AI models from research to silicon-validated production. This high-impact role involves working at the intersection of ML research, compiler stacks, and novel hardware, contributing to a breakthrough AI accelerator for data centers. The successful candidate will be crucial in enabling AI and hardware teams to move quickly and efficiently.

Benefits

Highly Competitive SalaryShare Option SchemePension SchemePrivate Health InsuranceCycle to WorkL&D AllowanceSubsidised On-site Lunches25 days paid holiday (plus bank holidays)Socials

Qualifications

5+ years of software or infrastructure engineering experience, with at least 2 years in an ML or AI-adjacent role
Strong Python skills and familiarity with major ML frameworks (PyTorch or JAX); comfortable reading and modifying model code
Hands-on experience building and operating ML pipelines in production: data pipelines, training orchestration, evaluation, and serving
Experience with experiment tracking and model lifecycle management tools (MLflow, W&B, DVC, or similar)
Solid understanding of containerisation (Docker) and orchestration (Kubernetes or Slurm) for distributed compute workloads
Infrastructure-as-code mindset: Terraform, Ansible, or equivalent; CI/CD pipelines (GitHub Actions, Jenkins, or similar)
Experience with hardware-accelerated compute (CUDA/GPU workflows, profiling, performance tuning) — even if not on custom silicon
Strong debugging and observability skills: distributed tracing, logging, metrics dashboards
Ability to work effectively in a fast-moving, ambiguous environment where the hardware and software are both being built simultaneously
Experience with custom or novel accelerator hardware (FPGAs, ASICs, NPUs, or research chips)
Familiarity with ML compiler stacks: MLIR, LLVM, TVM, XLA, or vendor-specific compilers (NVCC, TensorRT, etc.)
Experience with model optimisation techniques: quantisation (INT8/INT4/FP8), pruning, distillation, or mixed-precision training
Background in on-chip performance profiling and roofline analysis
Exposure to chip bring-up workflows: running early software stacks on pre-silicon simulation or first-silicon hardware
Contributions to open-source ML infrastructure or compiler tooling
Experience in a deeptech, semiconductor, or hardware startup environment

Responsibilities

Design and operate end-to-end ML pipelines: data ingest, training, evaluation, quantisation, and deployment onto custom AI accelerator hardware
Build and maintain experiment tracking, model registry, and versioning infrastructure (e.g. MLflow, W&B, or equivalent) tuned to hardware-in-the-loop workflows
Own CI/CD for ML: automated testing of model correctness, numerical accuracy, and on-chip performance after every change to models, compilers, or firmware
Develop and maintain tooling for benchmarking model inference on custom silicon, including latency, throughput, power, and utilisation metrics
Collaborate closely with ML researchers, compiler engineers, and hardware architects to identify and remove bottlenecks across the model-to-chip workflow
Instrument and monitor production inference deployments; design alerting and rollback strategies appropriate to hardware-accelerated serving
Manage compute resource scheduling across on-premises accelerator clusters and cloud (GPU/CPU) for training and simulation workloads
Drive infrastructure-as-code practices: containerisation, orchestration (Kubernetes/Slurm), and reproducible environment management
Contribute to the internal developer platform: self-service tooling, documentation, and runbooks that raise engineering productivity across the company

Skills

AnsibleDockerGitHubGitHub ActionsJenkinsKubernetesPythonTerraform

Languages

Python

Industry

DeeptechSemiconductorHardware startup

Relocation

Apply Now