Jobs / Koalafi
Lead Platform Engineer
Koalafi · Richmond, VA, United States
Richmond, VA, United StatesExp: 7+ yrs174,000-236,000 USD/yearlyRemote
Remuneration
174,000-236,000 USD/yearly
Location
Richmond, VA, United States
Visa sponsorship
Not specified
Job summary
Koalafi is seeking a Lead Platform Engineer to manage a team and drive the delivery of platform engineering initiatives. This role involves both people leadership, focusing on team performance, growth, and culture, and significant technical contributions to translate strategic direction into executable tasks. The ideal candidate will champion AI-assisted development practices and ensure operational readiness.
Benefits
Comprehensive medical, dental, and vision coverage20 PTO days + 11 paid holidays401(k) retirement with company matchingStudent Loan & Tuition ReimbursementCommuter assistanceParental leave (maternal + paternal)Inclusion and Associate Engagement Programs
Qualifications
- Demonstrated history as a formal people manager or tech lead with direct reports, including performance management, career development conversations, and building team capability
- Experience owning timelines, driving sprint execution, and being directly accountable for team deliverables
- 7+ years of hands-on experience in cloud infrastructure/platform engineering with demonstrated scope growth and increasing leadership responsibility
- Strong hands-on experience with Terraform in production (modules, patterns, environment strategy, state management)
- Strong hands-on experience operating Kubernetes in production
- Strong AWS fundamentals: practical experience with compute, networking, IAM, and production operations
- Experience building and maintaining CI/CD pipelines
- Strong observability fundamentals including metrics, logging, distributed tracing, SLO/SLI design, and alerting strategy
- Experience building automation using Bash and at least one general-purpose language (Python or Go)
- Strong troubleshooting skills: driving root cause analysis and implementing long-term fixes
- Hands-on experience using AI coding tools (e.g., GitHub Copilot, Cursor, Claude) as a productivity multiplier
- Comfort partnering with senior technical peers and sound judgment on when to consult versus decide independently
- Experience with Istio or other service mesh technologies (preferred)
- Experience operating relational databases in AWS (RDS PostgreSQL/Aurora/MS SQL) (preferred)
- Experience with AWS Lambda or serverless architectures (preferred)
- Experience improving reliability for distributed systems at scale (preferred)
- Prior experience as a technical anchor or team lead in a platform or infrastructure context (preferred)
- Experience building or operating infrastructure that supports AI/ML workloads (compute, storage, serving patterns) in AWS (preferred)
Responsibilities
- Manage a team of engineers with full accountability for their performance, growth, and wellbeing
- Own 1:1s, performance reviews, and career development conversations, providing direct, constructive feedback
- Build a team culture that is organized, reliable, and focused on impact
- Mentor engineers through code reviews, pairing, and delivery coaching
- Partner with the VP on people concerns that require escalation; help shape team working norms and resolve friction
- Be a strong technical contributor, carrying significant engineering weight and actively delivering high-impact work
- Own day-to-day technical decisions within the team's scope
- Translate architectural direction into sprint-level tasks
- Build and evolve CI/CD pipelines and delivery automation, ensuring deployment safety, consistency, and velocity
- Improve observability and operational readiness across metrics, logging, distributed tracing, and alerting
- Design and implement automation and self-service workflows using infrastructure-as-code, APIs, and developer platforms
- Implement secure delivery practices with policy-driven pipeline controls
- Contribute to infrastructure in Terraform, working within established architectural patterns and standards
- Support and improve secrets management patterns across runtime and CI/CD workflows
- Champion AI-assisted development practices across the team, including prompt engineering workflows, AI-powered code review, and tooling integrations
- Own incident response coordination, driving the process, communicating status, and ensuring issues reach the right people
- Participate in the on-call rotation and help drive improvements that reduce incidents and alert noise
- Build and maintain operational runbooks, escalation paths, and documentation for team-owned systems
- Drive production readiness as a continuous standard
Skills
AWSBashGitHubGoIAMIstioKubernetesAWS LambdaPostgreSQLPythonTerraformGitGitHub Actions
Languages
BashPythonGo
Relocation
No