Site Reliability / Infrastructure Engineer

Medal

• $120K — $160K *

New York, NY 10025In-Person

Information Technology

Less than 5 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5-7 years of experience in site reliability engineering or related fields
Deep understanding of GCP services, particularly Kubernetes and managed databases
Extensive experience with database scaling, sharding, and optimization in production
Proficient in Terraform and infrastructure-as-code practices
Hands-on experience with Elasticsearch cluster management
Strong incident response skills and experience in postmortem analysis
Fluency in CI/CD processes and tools, specifically GitHub Actions

Responsibilities

Own reliability across GCP infrastructure focusing on availability and latency
Lead incident response processes including on-call duties and postmortem analysis
Architect and implement database scaling strategies for MariaDB and PostgreSQL
Collaborate with product teams to design infrastructure that meets growing feature demands
Manage and enhance Terraform and Kubernetes configurations
Oversee Elasticsearch cluster management, including performance tuning and capacity planning
Develop and maintain observability tools like metrics and alerting systems

Benefits

Competitive salary and equity opportunities
Comprehensive health insurance including medical, dental, and vision
401(k) retirement savings plan
Wellness programs including fitness memberships and mental health resources
Paid parental leave with additional fertility support
Generous paid time off policy
Daily meal provisions and commuter benefits available on-site
Stipends for professional development and continuous learning

Full Job Description

The Role

Medal's infrastructure handles billions of clips, video ingestion pipelines, and social features at a massive scale most engineers never get to touch. We're looking for an SRE who cares deeply about reliability and scalability.

The work centers on reliability, incident response, scaling, and making sure our infrastructure keeps up with our growth. You'll own the on-call rotation, drive postmortems, and work directly with engineering teams to meet their infra needs.

The right person probably came through startups and scale-ups. You've been in the room when things broke at 2am, you've scaled databases under pressure, and you know the difference between a durable fix and a patch that buys you a week.

Key Responsibilities

Own reliability across our GCP infrastructure: Kubernetes clusters, managed services, and data pipelines, driving measurable improvements to availability and latency
Lead incident response end-to-end: on-call rotations, runbooks, postmortems, and the follow-through that makes sure the same thing doesn't happen twice
Architect and execute database scaling strategies (sharding, replication, query optimization, and capacity planning) across MySQL and Postgres at meaningful scale
Partner with product engineering to translate feature requirements into infrastructure designs that hold up as we grow
Manage and evolve our Terraform-managed GCP environment and Kubernetes cluster configurations
Own our Elasticsearch cluster end-to-end: capacity planning, sharding strategy, index lifecycle management, version upgrades, and performance tuning at production scale
Build and maintain observability across the stack: metrics, dashboards, alerting, and tracing
Constantly improve CI/CD reliability and delivery pipelines across GitHub Actions
Harden IAM, secrets management, and network segmentation as part of normal infra hygiene

About You

You've worked at startups and are comfortable in an environment of rapid growth where scaling up is a priority
You have great judgment - you know the difference between a durable, sustainable fix vs. a patch that buys you a week
You have deep, hands-on experience scaling and sharding relational databases in production environments
You know GCP maybe a little too well: Kubernetes, VPC, IAM, Cloud Logging, and the managed services ecosystem
You are fluent in Terraform and have owned real infrastructure-as-code at scale
You've operated Elasticsearch in production and know how to keep a cluster healthy
You have strong incident response instincts: you can work a P0 calmly, communicate clearly under pressure, and run a postmortem that prevents recurrence.
You've worked with GitHub Actions in a production CI/CD environment.
You have excellent communication skills (this is crucial!) and can both flag issues clearly and rapidly during incidents, and lead / write actionable postmortems

Our Stack

Google Cloud Platform

Terraform, Salt, GitHub Actions

Java, Redis, RabbitMQ, ElasticSearch, BigQuery, Kubernetes for backend

Electron+React

C# and C++ for native windows recording & more

Swift for iOS, Kotlin for Android

Benefits

Competitive salary and meaningful equity
Comprehensive medical, dental, and vision coverage
401(k)
Wellness and fitness perks including a Wellhub membership and mental health resources
Paid parental leave, fertility and maternal health benefits
Generous PTO policy
Daily meals and commuter benefits at our NYC HQ in Flatiron
Learning and development stipend

Benefits vary by country and employment type.

* Ladders Estimates

Similar Jobs

Staff Site Reliability Engineer
$136K — $170K *
Ping Identity Corporation
Remote
Today
Platform Operations and Site Reliability Lead
$120K — $150K *
eTelligent Group LLC
Lanham, MD 20706 (Prince Georges County)
Today
Senior Site Reliability Engineer
$140K — $210K *
Federal Reserve Bank
Boston, MA 02115 (Suffolk County)
5 days ago
Lead Site Reliability Engineer - Remote
$120K — $150K *
CentralSquare
Remote
2 weeks ago
CDAO Advana - Site Reliability Engineering Lead - Model Serving
$128K — $173K *
General Dynamics Information Technology, Inc.
Washington, DC 20011 (District Of Columbia County)
2 weeks ago
Lead Engineer, DevOps & SRE
$120K — $150K *
Launch Potato
Remote
2 weeks ago

Get Ready For Your
Next Interview

More Jobs at Medal

Product Designer, Design Systems
$90K — $130K *
New York, NY 10025 (New York County)
Today
Consumer Technology
In-Person
Infra Engineer - API
$120K — $160K *
New York, NY 10025 (New York County)
1 week ago
Information Technology
In-Person
Software Engineer - Game Recording
$100K — $150K *
New York, NY 10025 (New York County)
2 weeks ago
Consumer Technology
In-Person
Senior/Lead iOS Engineer
$130K — $180K *
New York, NY 10025 (New York County)
2 weeks ago
Consumer Technology
In-Person
Product Designer, Design Systems
$90K — $130K *
New York, NY 10025 (New York County)
2 weeks ago
Consumer Technology
In-Person

More Information Technology Jobs

Business Development Director
$300K — $345K + $120K bonus *
Tier1 IT Services Firm
Kansas City, MO 64116 (Clay County)
6 days ago
Client Partner / Business Developemnt - Banking
$250K — $320K + $70K bonus *
IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
6 days ago
Développeur Python (H/F)
$80K — $110K *
Extia
Montreal, QC H1A 0A1
Reposted Today
Information Security Threat Management Specialist
$95K — $144K *
Bank of America Corporation
Denver, CO 80219 (Denver County)
Today
Senior IT & Security Engineer
$100K — $130K *
MirrorWeb
Austin, TX 78745 (Travis County)
Today

Find similar Site Reliability / Infrastructure Engineer jobs:

Nationwide New York, NY

Site Reliability / Infrastructure Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Site Reliability / Infrastructure Engineer jobs:

Get Ready For Your
Next Interview