Jobs / GBST

SRE Engineer

Apply Now

GBST · London, ENG, United Kingdom

London, ENG, United KingdomFull timeHybrid

SRE Engineer

Apply Now

Remuneration

Not specified

Location

London, ENG, United Kingdom

Visa sponsorship

Not specified

Job summary

GBST is seeking an SRE Engineer to join their global, diverse team in London. This permanent full-time role involves managing and optimizing infrastructure for high availability, automating infrastructure with AWS, and implementing resilience testing strategies. The ideal candidate will be a problem solver with excellent communication skills and a desire to improve things.

Benefits

Instant savings and discounts on major retailersPrivate Health Insurance including Dental and Optical CoverNon-contributory Pension SchemeSalary Sacrifice Schemes – Car, Cycle to Work and Additional Pension ContributioAdditional GBST & U day off every yearEmployee Assistance Program (EAP)LinkedIn Learning

Qualifications

Ability to work on multiple tasks in parallel
Problem solver
Excellent communicator
Desire to improve things
Experience with Kubernetes and application troubleshooting
Experience with application deployment using GitOps / ArgoCD
Experience with K8s and application logging (Loki / fluent bit)
Experience with Service Mesh (Linkerd preferred)
Experience with Ingress Config / Troubleshooting (AWS LB Controller / Nginx)
Experience with Autoscaling configuration (Karpenter)
Experience with Certificate management (cert-manager)
Experience with AWS services including EKS, RDS, DMS, RDS Proxy, AWS Backup, API Gateway, RabbitMQ, AWS Transfer Family (SFTP / SFTP Connector), AWS NGFW, TGW, PrivateLink, AppStream, Lambda (Python), IAM, Kinesis, DynamoDB
Experience with Terragrunt / Terraform for troubleshooting defects
Experience with GitOps using Helm / ArgoCD
Experience with Observability Tooling including Grafana, Prometheus, Loki, Cloudwatch configuration/dashboard creation
Experience with CI/CD using Git / Code Deploy / Code Pipeline
Experience with the AWS cloud platform including designing, deploying, and maintaining scalable infrastructure
Strong knowledge of container orchestration tools like Kubernetes and Docker
Familiarity with deploying infrastructure as Code (IaC) with Terraform and CloudFormation
Understanding of implementing resilience testing strategies

Responsibilities

Manage and optimize infrastructure to ensure high availability and system reliability
Deliver 24/7 support via on call rotation for after hour issues
Participate in incident response processes, including triage, mitigation, and communication
Respond to production incidents, troubleshoot issues across the full stack, and ensure minimal downtime by driving root cause analysis and applying long-term fixes
Conduct blameless post-mortems to identify root causes and derive actionable insights, ensuring continuous improvement
Develop playbooks for common incidents, reducing Mean Time to Resolution (MTTR)

Skills

Argo CDAWSBashCloudFormationCloudWatchDatadogDockerDynamoDBEKSFluent BitGitGoGrafanaHelmIAMJavaKinesisKubernetesAWS LambdaLinkerdLokiNew RelicNGINXOpsgeniePagerDutyPrometheusPythonRabbitMQRubyTerraformTerragruntGitHub Actions

Work schedule

24/7 support via on call rotation

Industry

Wealth managementFinancial services

Relocation

Apply Now