Jobs / mLabs
SRE (Terminal)
mLabs · London, ENG, United Kingdom
London, ENG, United KingdomFull timeOnsite
Remuneration
Not specified
Location
London, ENG, United Kingdom
Visa sponsorship
Not specified
Job summary
Our client, a high-growth software development organization in the decentralized crypto social networks space, is seeking a battle-tested Site Reliability Engineering (SRE) Expert. This role involves navigating ambiguous, critical infrastructure challenges, scoping solutions, making architectural trade-offs, and executing with precision to ensure continuous uptime of a high-stakes, high-throughput environment.
Benefits
Competitive Base SalaryEquityToken Allocation
Qualifications
- Deep expertise in infrastructure-as-code (Terraform/OpenTofu), network topology, high-availability architecture, and system internals.
- Experience building foundational infrastructure and running high-availability environments where reliability is treated with financial-system levels of seriousness.
- Advanced proficiency with modern cloud providers (AWS, GCP) and container orchestration platforms (Kubernetes).
- Strong capacity to operate independently in high-stakes environments, deciding when to gather consensus versus when to execute autonomously.
- Experience with infrastructure security hardening, IAM architecture, or compliance mapping (e.g., SOC2, ISO).
- Hands-on experience managing and scaling high-throughput, low-latency data backbones and event streaming systems (Kafka, Redpanda, PostgreSQL).
- Working understanding of Web3/crypto infrastructure patterns and comfort operating within them.
Responsibilities
- Design, scale, and maintain highly available, multi-region, or active-active cloud infrastructure patterns.
- Lead critical incident response efforts, participate in on-call rotations, and drive comprehensive, blameless post-mortems to continuously harden the system.
- Write clean, production-grade automation code for infrastructure tooling, operators, and seamless systems integration.
- Exercise sharp judgment regarding system risks, balancing rapid deployment velocity with robust infrastructure safety and stability.
- Raise the engineering and operational bar across the organization through the implementation of rigorous standards, modern tooling, and technical mentorship.
Skills
AWSGCPGoIAMKafkaKubernetesOpenTofuPostgreSQLPythonTerraform
Industry
Software developmentDecentralized crypto social networks
Relocation
No