Jobs / Vertafore

Sr. Site Reliability Engineer

Vertafore · Denver, CO, United States
Denver, CO, United StatesExp: 8+ yrs110,000-155,000 USD/yearlyRemote
Remuneration
110,000-155,000 USD/yearly
Location
Denver, CO, United States
Visa sponsorship
No visa sponsorship
The selected candidate must be legally authorized to work in the United States.

Job summary

Vertafore is seeking a Senior Site Reliability Engineer to ensure the reliability, scalability, performance, and operational integrity of critical production services. This role involves owning the full-service lifecycle, from design to incident response and continuous improvement, operating autonomously across various environments including AWS and hybrid data centers. The position requires strong software engineering skills and a focus on reliability as a core engineering responsibility.

Benefits

Medical planVision planDental planLife insuranceAD&D insuranceShort term disabilityLong term disabilityPension planEmployer matchMaternity leavePaternity leaveParental leaveEmployee and family assistance programEducation assistanceEmployee referral programInternal recognition programPPO optionsHigh-deductible optionsHealth savings accountFlexible spending accounts

Qualifications

  • 8+ years of hands-on Site Reliability Engineering or reliability-focused engineering experience with end-to-end service ownership
  • Proven operation at a senior engineering scope with accountability for reliability outcomes
  • Strong software engineering skills
  • Practical experience applying SRE principles (SLIs, SLOs, error budgets)
  • Hands-on experience with AWS, Kubernetes, CI/CD, infrastructure as code, and hybrid environments
  • Strong knowledge of Linux and Windows systems, application platforms, and relational databases
  • Bachelor’s or master’s degree in computer science or equivalent experience
  • Participation in an on-call rotation
  • Ability to work flexible hours as required
  • High-speed internet for remote work

Responsibilities

  • Own production services end-to-end
  • Ensure reliability, availability, scalability, performance, and operational health
  • Define and manage SLIs and SLOs
  • Use error budgets to guide delivery decisions
  • Influence service and system design for improved fault tolerance, observability, and operational sustainability
  • Debug complex production issues across application code, services, and infrastructure
  • Perform root cause analysis using logs, metrics, traces, and code-level investigation
  • Build automation and self-healing mechanisms to prevent repeat failures
  • Execute production changes with safety, automation, and observability
  • Design and operate production observability aligned to service health and customer impact
  • Lead and participate in incident response for high-severity events
  • Collaborate with engineering, product, architecture, and operations teams
  • Operate with autonomy and sound judgment in reliability decisions

Skills

AWSC#.NETJavaKubernetesLinuxPythonWindows

Degrees

Bachelor’s degree in computer scienceMaster’s degree in computer science

Languages

C#NETJavaPythonReact

Work schedule

On-call rotationFlexible hours

Travel

Occasional travel to office

Industry

Insurance

Relocation

No