Jobs / Climavision
Senior Site Reliability Engineer (C#, .NET)
Climavision · United States · Remote
United StatesExp: 7+ yrs135,000-170,000 USD/yearlyRemote
Remuneration
135,000-170,000 USD/yearly
Location
United States · Remote
Eastern Daylight Time (UTC-4)
Visa sponsorship
Not specified
Job summary
Climavision is seeking a Senior Site Reliability Engineer to enhance the reliability and operational excellence of their customer-facing platform and weather data services.
Qualifications
- Bachelor's degree in computer science, software engineering, or related field.
- Minimum of 7 years of experience in Site Reliability Engineering or related role.
- Strong software engineering experience with C# / .NET applications.
- Experience refactoring production application code for horizontal scalability.
- Experience designing or operating multi-cluster high-availability architectures.
- Experience supporting customer-facing production systems.
- Hands-on experience operating production workloads in Kubernetes environments.
- Experience diagnosing and resolving production incidents.
- Experience operating Kubernetes outside of managed cloud environments.
- Experience with Kubernetes operational tooling and ecosystem technologies.
- Understanding of infrastructure automation and Infrastructure as Code concepts.
- Experience supporting CI/CD and production deployment pipelines.
- Experience with monitoring, logging, and observability platforms.
- Experience operating distributed systems and microservice architectures.
- Working knowledge of Microsoft Azure infrastructure.
- Strong troubleshooting skills across infrastructure and application layers.
- Experience participating in a structured production on-call rotation.
- Strong written and verbal communication skills.
Responsibilities
- Own production reliability for customer-facing platform and weather data services.
- Contribute to SLIs, SLOs, alerting standards, and operational metrics.
- Support production incident response efforts.
- Diagnose and resolve complex production issues.
- Drive multi-replica and multi-cluster high availability across .NET services.
- Operate and improve self-managed Kubernetes platform.
- Ensure Kubernetes platform lifecycle activities are executed properly.
- Improve reliability and operational maturity of production services.
- Design and validate Kubernetes workloads for resiliency and scalability.
- Read, debug, and contribute production-quality C# / .NET code.
- Partner with software engineering teams to improve production readiness.
- Maintain and improve deployment pipelines and infrastructure automation.
- Support and evolve observability platform.
- Conduct performance engineering and capacity-planning efforts.
- Facilitate blameless postmortem reviews.
- Improve disaster recovery and business continuity capabilities.
- Drive operational excellence initiatives.
Skills
AnsibleAzureC#Datadog.NETGrafanaHelmIstioKubernetesLokiNATSOctopus DeployOpenTelemetryPostgreSQLPrometheusRabbitMQRancherTerraform
Relocation
No