Jobs / FalconSmartIT
Site Reliability Engineer
FalconSmartIT · Brighton, ENG, United Kingdom
Brighton, ENG, United KingdomExp: 12+ yrsHybrid
Remuneration
Not specified
Location
Brighton, ENG, United Kingdom
Visa sponsorship
Not specified
Job summary
Site Reliability Engineer responsible for modernizing IT operations through observability practices and automation.
Qualifications
- Expertise in implementing Site Reliability Engineering principles.
- Knowledge of observability tools like Dynatrace and Datadog.
- Proficiency in automation and scripting with Python and Ansible.
- Experience with cloud platforms AWS and Azure.
- Understanding of containerization and orchestration tools.
- Proficiency in cloud native distributed systems and microservices.
- Exposure to AI/ML techniques for automated problem resolution.
- Familiarity with CI/CD pipelines and automated release solutions.
- Experience with chaos engineering tools.
- Ability to manage multiple projects in a fast-paced environment.
- Strong interpersonal and communication skills.
- Excellent problem solving and analytical thinking.
Responsibilities
- Implement strategies for modernizing IT operations and enhancing observability.
- Architect and deploy observability platforms for system monitoring.
- Drive AI-driven alerting and proactive anomaly detection.
- Develop and enforce SRE best practices including SLOs and SLIs.
- Create AIOPS roadmap for operational efficiency.
- Automate repetitive tasks using scripting and orchestration tools.
- Drive automated incident responses and self-healing automation.
- Collaborate with teams to ensure system scalability and resilience.
- Manage incident and root cause analysis processes through automation.
- Partner with teams to enable shift-left engineering practices.
- Mentor teams on SRE principles and tools.
- Advocate for a culture of reliability and continuous improvement.
Skills
AnsibleAWSAzureDatadogDockerDynatraceKubernetesPython
Relocation
No