Jobs / Boston Consulting Group
Senior Site Reliability Engineer
Boston Consulting Group · London, ENG, United Kingdom
London, ENG, United KingdomExp: 5-8 yrsHybrid
Remuneration
Not specified
Location
London, ENG, United Kingdom
Visa sponsorship
Not specified
Job summary
The Senior Site Reliability Engineer is responsible for running the engineering capability behind a defined area of reliability across the organization. This role involves working across multiple SRE disciplines, applying engineering thinking to reduce operational toil, improve resilience, and embed reliability and governance into delivery and operational workflows. The role drives engineering quality and consistency, contributes to wider engineering standards, and helps shape how reliability is delivered across the organization. It builds reusable patterns, mentors engineers, and provides senior engineering input across a wider set of stakeholders.
Qualifications
- 5–8 years of experience in Site Reliability Engineering, Platform Engineering, or related operational engineering disciplines.
- Strong hands-on experience across multiple SRE domains, including cloud, automation, observability, and CI/CD.
- Demonstrated experience designing and implementing automation and reliability solutions at scale.
- Deep knowledge of at least one cloud platform (AWS or Azure), including networking, identity, and observability primitives.
- Experience with Infrastructure-as-Code (e.g., Terraform) and CI/CD pipelines.
- Strong scripting experience (e.g., Python).
- Experience leading incident response and driving systemic improvement.
- Strong stakeholder engagement and technical communication skills.
- Deep hands-on experience with one or more enterprise observability platforms (e.g., Splunk, Datadog).
- Proven experience designing and operating telemetry pipelines, ingestion controls, and observability cost management.
- Proven experience designing signals (SLIs, SLOs, synthetic checks, alerts) and ops automation triggered from those signals.
- Experience driving SLO/SLI practices across multiple teams.
- Deep hands-on experience operating cloud infrastructure across at least two of AWS, Azure, GCP, or Alibaba Cloud.
- Proven experience designing reusable IaC patterns and landing zone components across cloud providers.
- Strong working knowledge of cloud networking, account management, identity primitives, and policy enforcement across providers.
- Experience driving cloud platform engineering standards and governance across multiple teams.
- Deep hands-on experience with identity platforms (e.g., Entra ID) and secrets management (e.g., HashiCorp Vault).
- Proven experience designing OIDC, workload identity, and dynamic credential patterns.
- Experience driving Zero Trust and least-privilege adoption across multiple teams.
- Deep hands-on experience with security tooling embedded in CI/CD pipelines.
Responsibilities
- Run and continuously improve reliability engineering systems within scope, including automation, pipelines, observability, and operational tooling.
- Design and implement engineering solutions to eliminate operational toil at scale and embed reliability into delivery workflows.
- Help shape engineering standards, patterns, and reusable frameworks across the SRE practice.
- Lead engineering response to complex incidents within scope, drive systemic remediation, and contribute to post-incident learning.
- Mentor and coach less senior engineers across reliability engineering, automation, observability, and SRE principles.
- Drive cross-team collaboration with engineering, platform, and operations functions to embed reliability and governance through engineering controls.
- Communicate engineering status, risks, and recommendations clearly to senior stakeholders and leadership forums.
- Contribute to monthly operational reviews with structured metrics on service health, ingestion or pipeline performance, automation coverage, and improvement progress.
Skills
Alibaba CloudAWSAzureDatadogDockerGCPKubernetesOpen Policy AgentPagerDutyPythonServiceNowSplunkTerraformVault
Certifications
Cloud certification at professional level
Work schedule
On-call rotation
Travel
Occasional travel for team or stakeholder engagement
Relocation
No