Jobs / Department for Work and Pensions (DWP)
Senior Site Reliability Engineer
Department for Work and Pensions (DWP) · Birmingham, ENG, United Kingdom
Birmingham, ENG, United Kingdom57,946-80,664 GBP/yearlyHybrid
Remuneration
57,946-80,664 GBP/yearly
Location
Birmingham, ENG, United Kingdom
Visa sponsorship
No visa sponsorship
Please note that while we consider sponsorship requests in accordance with current DWP guidance and Home Office policy, sponsorship cannot be guaranteed.
Job summary
The Department for Work and Pensions (DWP) is seeking a Senior Site Reliability Engineer to join their SRE teams, focusing on driving the adoption of SRE best practices across their cloud estate. This role involves ensuring the reliability and performance of applications and infrastructure, working with development teams, and providing technical direction to other SREs.
Benefits
Civil Service Pension with employer contribution of 28.97%Working patterns to support work/life balance (job sharing, term-time working, fGenerous annual leave (25-30 days plus 9 public/privilege days)Financial wellbeing support (interest-free season ticket loans, cycle to work scHealth and wellbeing support (Employee Assistance Programme, HASSRA membership)Family friendly policies (enhanced maternity and shared parental leave pay)Funded learning and development (qualifications, accreditations, coaching, mentoInclusive and diverse environment with professional and interpersonal networks
Qualifications
- Pass Security Check clearance
- Demonstrable experience in reliability engineering including capacity and performance management through monitoring, logging, and alerting
- Demonstrable experience supporting a Live Service, including live operations, incident management, and continuous improvement
- Demonstrable experience developing and supporting cloud-based applications in AWS
- Demonstrable experience building and maintaining CI/CD pipelines
- Demonstrable experience communicating effectively with stakeholders at multiple levels
- Demonstrable experience using automation to remove toil with scripting, infrastructure, and configuration as code
Responsibilities
- Drive adoption of SRE best practice across the cloud estate
- Work with teams to ensure standards and governance are met for cloud onboarding
- Ensure citizen-facing applications meet operational and security needs for production
- Ensure reliability and performance of applications and infrastructure
- Collaborate with application teams on developing reliable and secure solutions
- Provide technical direction and support to other SREs
- Guide development teams on good practice and department standards for application infrastructure
- Design and develop techniques for improving application reliability, run books, knowledge transfer, and SRE strategy
- Collaborate with development teams, provide best practice guidance, and ensure application monitoring
- Foster engineering ownership, SRE best practice, and integrity/maintenance of Live Service
- Manage error budget and align work accordingly
- Act as focal point for investigation and resolution of major/complex incidents
- Assess impact of change requests, provide technical expertise, and authorize changes
- Coach and mentor application development and operations engineers in SRE practices
- Conduct reviews for high priority and major incidents
- Seek and capture ideas from stakeholders and team members for improvements
- Provide on-call support to restore services
- Reduce toil and increase automation to improve reliability and reduce repetitive tasks
Skills
AWS
Work schedule
Flexible workingFull-timeJob sharePart-timeOn-call rota
Security clearance
Security Check
Relocation
No