Site Reliability Engineer

VyncaCare

• $100K — $130K *

US-AnywhereRemote in United States

Healthcare

Less than 5 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

3-5 years experience in Site Reliability Engineering, DevOps, or similar roles, ideally in healthcare or high-growth tech environments.
Bachelor's degree in Computer Science or a related field, or equivalent professional experience.
Hands-on experience with AWS for operating production workloads.
Proficiency in Terraform for infrastructure as code, including module development and state management.
Experience managing production Kubernetes environments and deploying applications using Helm.
Familiarity with distributed systems concepts such as event sourcing and fault tolerance.
Strong problem-solving skills with the ability to troubleshoot complex issues.

Responsibilities

Design and manage AWS infrastructure using Terraform as the source of truth.
Operate and maintain production workloads on Kubernetes.
Package, deploy, and manage apps with Helm and automation tools.
Build and enhance distributed/event-driven systems, focusing on reliability mechanisms.
Monitor SLIs and SLOs to balance system reliability with engineering velocity.
Automate deployment and incident response workflows to enhance system resilience.
Lead incident response and implement long-term reliability improvements.

Benefits

Remote work opportunity with flexibility in working hours, particularly for East Coast business hours (EST).
Join a mission-driven team in the healthcare technology sector, contributing to improved patient care.
Significant ownership of projects, impacting organization-wide systems and architecture.
A collaborative environment partnering with cross-functional teams including Engineering and Product.
Participation in an on-call rotation offers exposure and experience in real-time operational support.

Full Job Description

About the job

We're looking for a Site Reliability Engineer (E3) to help build and operate the infrastructure that powers Vynca's healthcare technology platform. In this role, you'll work at the intersection of software engineering, cloud infrastructure, and operations to ensure our systems are reliable, scalable, secure, and performant.

As a member of the Technology team, you'll design and manage cloud infrastructure in AWS, operate Kubernetes-based workloads, improve observability across our platform, and automate operational processes that enable engineering teams to move quickly and safely. You'll play a critical role in maintaining the health of our production environment while helping shape the future architecture of our systems.

This is a hands-on engineering role with significant ownership and impact. You'll partner closely with Software Engineers, Product teams, and Data teams to build resilient systems that support our mission of delivering comprehensive care for more quality days at home.

This position is remote and requires working East Coast business hours (EST).

What You'll Do

Design, provision, and manage AWS infrastructure using Terraform as the source of truth.
Operate, maintain, and scale production workloads running on Kubernetes.
Package, deploy, and manage applications using Helm and infrastructure automation tools.
Build, operate, and improve distributed and event-driven systems, including event sourcing, partitioning, event ordering, replay, and failure recovery mechanisms.
Define, monitor, and maintain Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets to balance reliability and engineering velocity.
Develop automation for deployment, scaling, monitoring, incident response, and operational workflows to reduce manual effort and improve system resilience.
Own platform observability by implementing and maintaining metrics, logging, tracing, monitoring, and alerting solutions.
Lead incident response efforts, facilitate blameless postmortems, and drive long-term corrective actions that improve system reliability.
Partner with Product and Engineering teams on capacity planning, performance optimization, and resilient system design.
Implement and maintain security best practices to support HIPAA, SOC 2, and other compliance requirements.
Participate in an on-call rotation and provide operational support for production systems.

Your experience and qualifications

Experience: Three to five (3-5) years of experience in Site Reliability Engineering, DevOps Engineering, Platform Engineering, Cloud Infrastructure Engineering, or similar infrastructure-focused roles, preferably within healthcare, SaaS, or high-growth technology environments.
Education: Bachelor's degree in Computer Science, Information Systems, Software Engineering, or a related technical field; equivalent professional experience will also be considered.
Strong hands-on experience operating production workloads within AWS environments.
Proven experience managing infrastructure as code using Terraform, including module development, state management, and deployment automation.
Experience operating and supporting production Kubernetes environments.
Hands-on experience deploying and managing applications using Helm.
Experience working with distributed systems, event-driven architectures, or event-sourcing platforms, including concepts such as partitioning, event ordering, replay, and fault tolerance.
Experience establishing and managing observability practices including monitoring, logging, tracing, alerting, and incident response.
Strong understanding of Linux systems administration, networking, cloud architecture, and distributed systems fundamentals.
Experience designing, implementing, and maintaining CI/CD pipelines and deployment automation.
Strong problem-solving skills with the ability to troubleshoot complex infrastructure and application issues.
Excellent written and verbal communication skills with the ability to collaborate effectively across technical and non-technical teams.
High level of ownership, accountability, and initiative with a proactive approach to reliability and operational excellence.
Ability and willingness to participate in an on-call rotation supporting production systems.

Preferred Qualifications

Strong programming or scripting experience with Python, Go, or similar languages.
Experience with observability platforms such as Prometheus, Grafana, Datadog, CloudWatch, SigNoz, or OpenTelemetry.
Experience with GitOps tools such as ArgoCD or Flux.
Experience managing databases such as PostgreSQL, MySQL, Redshift, or ClickHouse.
Experience implementing secrets management solutions such as AWS Secrets Manager or HashiCorp Vault.
Experience supporting healthcare technology platforms or other highly regulated environments.
Familiarity with data infrastructure technologies including Snowflake, Redshift, and ETL/ELT pipelines.
Experience with database performance tuning and optimization.

At this time we are only considering applicants in the following states: Arizona, California, Colorado, Florida, Georgia, Illinois, Nevada, North Carolina, Oregon, Texas, and Washington.

Additional Information

The hiring process for this role may consist of applying, followed by a phone screen, online assessment(s), interview(s), an offer, and background/reference checks.
Background Screening: A background check, which may include a drug test or other health screenings depending on the role, will be required prior to employment.
Job Description Scope: This job description is not exhaustive and may include additional activities, duties, and responsibilities not listed herein.
Vaccination Requirement: Employees in patient, client, or customer-facing roles must be vaccinated against influenza. Requests for religious or medical accommodations will be considered but may not always be approved.
Employment Eligibility: Compliance with federal law requires identity and work eligibility verification using E-Verify upon hire.

* Ladders Estimates

Similar Jobs

SITEC - Storage Engineer - MacDill AFB
$104K — $166K *
Peraton
Tampa, FL 33647 (Hillsborough County)
Today
Deployed Engineer, International
$106K — $132K *
Persistent Systems
Remote
Today
Systems Engineer - Hydraulics and Fuels (Experienced or Lead)
$104K — $172K *
Boeing
Oklahoma City, OK 73160 (Cleveland County)
Reposted Today
System Engineer
$90K — $130K *
Arcfield
Chantilly, VA 20152 (Loudoun County)
Reposted Today
Eng Del System Operator Assc
$74K — $109K *
American Electric Power
Salem, VA 24153 (Salem County)
Today
Eng Sr - Sys
$90K — $130K *
BAE Systems
Austin, TX 78745 (Travis County)
Reposted Today

Get Ready For Your
Next Interview

More Jobs at VyncaCare

ECM Clinical Manager (RN)
$90K — $115K *
Remote
Today
Healthcare
Remote in United States
Site Reliability Engineer
$100K — $130K *
Remote
Today
Healthcare
Remote in United States
ECM Clinical Manager (LCSW/LMFT)
$90K — $115K *
Remote
Today
Healthcare
Remote in United States
Director, Enhanced Care Management
$100K — $130K *
Remote
1 week ago
Healthcare
Remote in United States
Director, ECM Training & Quality
$100K — $130K *
Remote
2 weeks ago
Healthcare
Remote in United States

More Healthcare Jobs

Executive Director, Facilities Operations
$150K — $170K *
The Vernon Staffing Group
Cleveland, OH 44106 (Cuyahoga County)
Reposted Today
Licensed Therapist
Small Joys
Remote
Reposted Yesterday
Manager, Regional Clinical Nutrition and Wellness - Remote
$98K — $147K *
Trinity Health
Livonia, MI 48154 (Wayne County)
Today
Program Integrity Analyst III
$75K — $95K *
Qlarant
Remote
Today
Optometrist (FT/PT)
$150K — $200K *
NSI Healthcare
Winston Salem, NC 27107 (Forsyth County)
Today

Find similar Site Reliability Engineer jobs:

Nationwide Remote

Site Reliability Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Site Reliability Engineer jobs:

Get Ready For Your
Next Interview