Jobs / Pacific Northwest National Laboratory

Lead DevOps/Platform Engineer IV - Richland, WA

Pacific Northwest National Laboratory · Richland, WA, United States
Richland, WA, United StatesExp: 3-18 yrs161,300-255,000 USD/yearlyOnsite
Remuneration
161,300-255,000 USD/yearly
Location
Richland, WA, United States
Visa sponsorship
Not specified

Job summary

PNNL is seeking a Lead DevOps/Platform Engineer to contribute to advanced AI engineering initiatives, focusing on next-generation systems like agentic AI platforms, large-scale data orchestration, and real-time intelligence processing. This role involves applying expertise in scalable system design and AI/ML engineering to build mission-critical capabilities, while also developing technical leadership and mentoring junior team members.

Benefits

Medical insuranceDental insuranceVision insuranceTelehealth care optionsMental health benefitsWellness coachingHealth savings accountFlexible spending accountsBasic life insuranceDisability insuranceEmployee assistance programBusiness travel insuranceTuition assistanceRelocation assistanceBackup childcareLegal benefitsSupplemental parental bonding leaveSurrogacy and adoption assistanceFertility supportCompany-funded pension plan

Qualifications

  • PhD and 3 years of Software Engineering experience
  • MS/MA and 5 years of Software Engineering experience
  • BS/BA and 7 years of Software Engineering experience
  • AA and 16 years of Software Engineering experience in designing, architecting, programming, deploying, and automating software solutions in support of scientific research or consumer digital product development
  • HS/GED and 18 years of Software Engineering experience in designing, architecting, programming, deploying, and automating software solutions in support of scientific research or consumer digital product development
  • Degree in computer science, software engineering, or a related field
  • Track record of architecting and operating large-scale infrastructure supporting significant user bases, high-volume transaction systems, petabyte-scale data platforms, or production ML systems serving millions of predictions
  • Experience building and leading high-performing platform engineering, DevOps, or MLOps teams through hiring, mentoring, technical guidance, and career development
  • Experience establishing infrastructure practices, platform strategies, MLOps frameworks, and DevOps transformation initiatives at organizational scale
  • Background in mission-critical, regulated, or high-security environments (government, defense, financial services, healthcare) with understanding of compliance requirements for both traditional systems and ML/AI applications
  • Demonstrated success leading complex, multi-team infrastructure and MLOps initiatives from architecture through production deployment, operational handoff, and continuous improvement
  • Expert-level proficiency in Python and at least one additional language (Go, C#/.NET, C++)
  • Proven ability to establish infrastructure automation standards, architect scalable tooling platforms, and guide teams in developing sophisticated automation frameworks
  • Mastery of Infrastructure as Code principles with deep expertise in Terraform, CloudFormation, Pulumi, or ARM templates
  • Demonstrated ability to design enterprise-wide IaC strategies, module libraries, and governance frameworks
  • Proven track record of architecting and leading implementation of enterprise-grade CI/CD platforms
  • Ability to define build/release strategies, establish deployment patterns, and drive continuous delivery adoption
  • Experience designing internal developer platforms that abstract complexity and accelerate team velocity
  • Expert proficiency with GitOps methodologies (ArgoCD, Flux), infrastructure testing frameworks (Terratest, InSpec), and policy-as-code (OPA, Sentinel)
  • Strategic application of AI assist tools to drive team productivity, accelerate automation development, and optimize operational efficiency

Responsibilities

  • Contribute to next-generation systems including agentic AI platforms, large-scale data orchestration, and real-time intelligence processing
  • Apply expertise in scalable system design and AI/ML engineering to build mission-critical capabilities
  • Develop technical leadership and establish as a key contributor to the engineering community
  • Design and deploy scalable agentic AI systems with dynamic reasoning and decision-making capabilities
  • Architect LLM orchestration frameworks using LangChain, LlamaIndex, and emerging agent platforms
  • Build MLOps platforms spanning experiment tracking, model versioning, deployment, and governance
  • Develop developer-focused tooling, adapters, and interfaces for AI-native frameworks
  • Integrate multi-modal data sources (text, vision, structured/sensor data) into cohesive reasoning pipelines
  • Design microservices architectures coordinating across multiple domains and security enclaves
  • Lead distributed system design processing data from hundreds of sources simultaneously
  • Architect real-time streaming platforms handling terabytes per hour with event-driven architectures
  • Build robust data pipelines for petabyte-scale ETL, data lake/mesh architectures, and real-time analytics
  • Design container orchestration (Kubernetes) and CI/CD pipelines for classified and edge environments
  • Deploy AI systems in highly secure environments with resilient agent-to-agent communications
  • Create monitoring and observability systems (logging, metrics, tracing) across secure enclaves
  • Ensure compliance with ethical AI standards and security-first DevOps practices
  • Build geospatial processing, time-series, and intelligence data fusion capabilities
  • Lead a team of engineers to deliver on high-risk, high-impact ambiguous technical scope
  • Drive technical strategy and architectural decisions across cross-functional teams
  • Translate ambiguous requirements and cutting-edge research into actionable technical roadmaps

Skills

AirflowArgo CDARM TemplatesAWSAzureAzure Key VaultCloudFormationConsulCortexC++C#Databricks.NETEventBridgeFluxGCPGoIstioJaegerKafkaKubernetesLinkerdLokiMongoDBOpen Policy AgentPostgreSQLPrometheusPub/SubPulumiPythonRedshiftS3Secrets ManagerSnowflakeSNSSparkSplunkSQSTempoTerraformThanosVault

Degrees

PhDMS/MABS/BAAAHS/GEDComputer ScienceSoftware Engineering

Security clearance

Federal security clearanceAccess to classified matter in accordance with 10 CFR 710, Appendix B

Relocation

Yes