Jobs / FalconSmartIT

Site Reliability Engineer

FalconSmartIT · Brighton, ENG, United Kingdom

Brighton, ENG, United KingdomExp: 12+ yrsHybrid

Site Reliability Engineer

Remuneration

Not specified

Location

Brighton, ENG, United Kingdom

Visa sponsorship

Not specified

Job summary

Site Reliability Engineer responsible for modernizing IT operations through observability practices and automation.

Qualifications

Expertise in implementing Site Reliability Engineering principles.
Knowledge of observability tools like Dynatrace and Datadog.
Proficiency in automation and scripting with Python and Ansible.
Experience with cloud platforms AWS and Azure.
Understanding of containerization and orchestration tools.
Proficiency in cloud native distributed systems and microservices.
Exposure to AI/ML techniques for automated problem resolution.
Familiarity with CI/CD pipelines and automated release solutions.
Experience with chaos engineering tools.
Ability to manage multiple projects in a fast-paced environment.
Strong interpersonal and communication skills.
Excellent problem solving and analytical thinking.

Responsibilities

Implement strategies for modernizing IT operations and enhancing observability.
Architect and deploy observability platforms for system monitoring.
Drive AI-driven alerting and proactive anomaly detection.
Develop and enforce SRE best practices including SLOs and SLIs.
Create AIOPS roadmap for operational efficiency.
Automate repetitive tasks using scripting and orchestration tools.
Drive automated incident responses and self-healing automation.
Collaborate with teams to ensure system scalability and resilience.
Manage incident and root cause analysis processes through automation.
Partner with teams to enable shift-left engineering practices.
Mentor teams on SRE principles and tools.
Advocate for a culture of reliability and continuous improvement.

Skills

AnsibleAWSAzureDatadogDockerDynatraceKubernetesPython

Relocation

No