Jobs / Vertafore
Sr. Site Reliability Engineer
Vertafore · Denver, CO, United States
Denver, CO, United StatesExp: 8+ yrs110,000-155,000 USD/yearlyRemote
Remuneration
110,000-155,000 USD/yearly
Location
Denver, CO, United States
Visa sponsorship
No visa sponsorship
The selected candidate must be legally authorized to work in the United States.
Job summary
Vertafore is seeking a Senior Site Reliability Engineer to ensure the reliability, scalability, performance, and operational integrity of critical production services. This role involves owning the full-service lifecycle, from design to incident response and continuous improvement, operating autonomously across various environments including AWS and hybrid data centers. The position requires strong software engineering skills and a focus on reliability as a core engineering responsibility.
Benefits
Medical planVision planDental planLife insuranceAD&D insuranceShort term disabilityLong term disabilityPension planEmployer matchMaternity leavePaternity leaveParental leaveEmployee and family assistance programEducation assistanceEmployee referral programInternal recognition programPPO optionsHigh-deductible optionsHealth savings accountFlexible spending accounts
Qualifications
- 8+ years of hands-on Site Reliability Engineering or reliability-focused engineering experience with end-to-end service ownership
- Proven operation at a senior engineering scope with accountability for reliability outcomes
- Strong software engineering skills
- Practical experience applying SRE principles (SLIs, SLOs, error budgets)
- Hands-on experience with AWS, Kubernetes, CI/CD, infrastructure as code, and hybrid environments
- Strong knowledge of Linux and Windows systems, application platforms, and relational databases
- Bachelor’s or master’s degree in computer science or equivalent experience
- Participation in an on-call rotation
- Ability to work flexible hours as required
- High-speed internet for remote work
Responsibilities
- Own production services end-to-end
- Ensure reliability, availability, scalability, performance, and operational health
- Define and manage SLIs and SLOs
- Use error budgets to guide delivery decisions
- Influence service and system design for improved fault tolerance, observability, and operational sustainability
- Debug complex production issues across application code, services, and infrastructure
- Perform root cause analysis using logs, metrics, traces, and code-level investigation
- Build automation and self-healing mechanisms to prevent repeat failures
- Execute production changes with safety, automation, and observability
- Design and operate production observability aligned to service health and customer impact
- Lead and participate in incident response for high-severity events
- Collaborate with engineering, product, architecture, and operations teams
- Operate with autonomy and sound judgment in reliability decisions
Skills
AWSC#.NETJavaKubernetesLinuxPythonWindows
Degrees
Bachelor’s degree in computer scienceMaster’s degree in computer science
Languages
C#NETJavaPythonReact
Work schedule
On-call rotationFlexible hours
Travel
Occasional travel to office
Industry
Insurance
Relocation
No