Jobs / Vertafore

Sr. Site Reliability Engineer

Apply Now

Vertafore · Denver, CO, United States

Denver, CO, United StatesExp: 8+ yrs110,000-155,000 USD/yearlyRemote

Apply Now

Remuneration

110,000-155,000 USD/yearly

Location

Denver, CO, United States

Visa sponsorship

No visa sponsorship

The selected candidate must be legally authorized to work in the United States.

Job summary

Vertafore is seeking a Senior Site Reliability Engineer to ensure the reliability, scalability, performance, and operational integrity of critical production services. This role involves owning the full-service lifecycle, from design to incident response and continuous improvement, operating autonomously across various environments including AWS and hybrid data centers. The position requires strong software engineering skills and a focus on reliability as a core engineering responsibility.

Benefits

Medical planVision planDental planLife insuranceAD&D insuranceShort term disabilityLong term disabilityPension planEmployer matchMaternity leavePaternity leaveParental leaveEmployee and family assistance programEducation assistanceEmployee referral programInternal recognition programPPO optionsHigh-deductible optionsHealth savings accountFlexible spending accounts

Qualifications

8+ years of hands-on Site Reliability Engineering or reliability-focused engineering experience with end-to-end service ownership
Proven operation at a senior engineering scope with accountability for reliability outcomes
Strong software engineering skills
Practical experience applying SRE principles (SLIs, SLOs, error budgets)
Hands-on experience with AWS, Kubernetes, CI/CD, infrastructure as code, and hybrid environments
Strong knowledge of Linux and Windows systems, application platforms, and relational databases
Bachelor’s or master’s degree in computer science or equivalent experience
Participation in an on-call rotation
Ability to work flexible hours as required
High-speed internet for remote work

Responsibilities

Own production services end-to-end
Ensure reliability, availability, scalability, performance, and operational health
Define and manage SLIs and SLOs
Use error budgets to guide delivery decisions
Influence service and system design for improved fault tolerance, observability, and operational sustainability
Debug complex production issues across application code, services, and infrastructure
Perform root cause analysis using logs, metrics, traces, and code-level investigation
Build automation and self-healing mechanisms to prevent repeat failures
Execute production changes with safety, automation, and observability
Design and operate production observability aligned to service health and customer impact
Lead and participate in incident response for high-severity events
Collaborate with engineering, product, architecture, and operations teams
Operate with autonomy and sound judgment in reliability decisions

Skills

AWSC#.NETJavaKubernetesLinuxPythonWindows

Degrees

Bachelor’s degree in computer scienceMaster’s degree in computer science

Languages

C#NETJavaPythonReact

Work schedule

On-call rotationFlexible hours

Travel

Occasional travel to office

Industry

Insurance

Relocation

Apply Now