Jobs / Climavision

Senior Site Reliability Engineer (C#, .NET)

Climavision · United States · Remote
United StatesExp: 7+ yrs135,000-170,000 USD/yearlyRemote
Remuneration
135,000-170,000 USD/yearly
Location
United States · Remote
Eastern Daylight Time (UTC-4)
Visa sponsorship
Not specified

Job summary

Climavision is seeking a Senior Site Reliability Engineer to enhance the reliability and operational excellence of their customer-facing platform and weather data services.

Qualifications

  • Bachelor's degree in computer science, software engineering, or related field.
  • Minimum of 7 years of experience in Site Reliability Engineering or related role.
  • Strong software engineering experience with C# / .NET applications.
  • Experience refactoring production application code for horizontal scalability.
  • Experience designing or operating multi-cluster high-availability architectures.
  • Experience supporting customer-facing production systems.
  • Hands-on experience operating production workloads in Kubernetes environments.
  • Experience diagnosing and resolving production incidents.
  • Experience operating Kubernetes outside of managed cloud environments.
  • Experience with Kubernetes operational tooling and ecosystem technologies.
  • Understanding of infrastructure automation and Infrastructure as Code concepts.
  • Experience supporting CI/CD and production deployment pipelines.
  • Experience with monitoring, logging, and observability platforms.
  • Experience operating distributed systems and microservice architectures.
  • Working knowledge of Microsoft Azure infrastructure.
  • Strong troubleshooting skills across infrastructure and application layers.
  • Experience participating in a structured production on-call rotation.
  • Strong written and verbal communication skills.

Responsibilities

  • Own production reliability for customer-facing platform and weather data services.
  • Contribute to SLIs, SLOs, alerting standards, and operational metrics.
  • Support production incident response efforts.
  • Diagnose and resolve complex production issues.
  • Drive multi-replica and multi-cluster high availability across .NET services.
  • Operate and improve self-managed Kubernetes platform.
  • Ensure Kubernetes platform lifecycle activities are executed properly.
  • Improve reliability and operational maturity of production services.
  • Design and validate Kubernetes workloads for resiliency and scalability.
  • Read, debug, and contribute production-quality C# / .NET code.
  • Partner with software engineering teams to improve production readiness.
  • Maintain and improve deployment pipelines and infrastructure automation.
  • Support and evolve observability platform.
  • Conduct performance engineering and capacity-planning efforts.
  • Facilitate blameless postmortem reviews.
  • Improve disaster recovery and business continuity capabilities.
  • Drive operational excellence initiatives.

Skills

AnsibleAzureC#Datadog.NETGrafanaHelmIstioKubernetesLokiNATSOctopus DeployOpenTelemetryPostgreSQLPrometheusRabbitMQRancherTerraform

Relocation

No