Senior Production Engineer

Anduril Industries • $166K — $220K *

Washington, DC 20011In-Person

Enterprise Technology

Less than 5 years of experience

Reposted 1 week ago

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5+ years in SRE, platform engineering, or backend development roles
Production-quality Go experience, modifying core platform services
Deep practical experience with distributed systems and failure modes
Kubernetes knowledge sufficient to understand service operations
Proven ability to debug complex systems and trace failures

Responsibilities

Diagnose and fix stability vulnerabilities in core platform services
Implement resilience patterns directly in service code
Design multi-replica support for existing services
Collaborate with service owners on testing and validation
Trace cascading failures across service boundaries to root causes
Contribute to enhancements of the observability platform
Perform light infrastructure work with Terraform and Kubernetes

Benefits

Comprehensive benefits package with little to no cost for employees
Support for health and recovery
Competitive equity grants included in total compensation
Top-tier benefits for full-time employees

Full Job Description

ABOUT THE TEAM

The SRE team owns reliability and infrastructure for Anduril's cloud deployments. We operate Kubernetes clusters, Terraform infrastructure, and observability platforms across 10+ production environments supporting active defense contracts. When platform services break under real operational load, we're the team that fixes them - often at the code level, not just the config level.
ABOUT THE JOB

We are looking for a Senior Production Engineer to join our team in Costa Mesa, CA (or DC). In this role, you will be responsible for diagnosing and fixing stability vulnerabilities in core platform services that cause cascading failures in multi-tenant cloud deployments. You will write production Go to implement resilience patterns - leader election, circuit breakers, failure domain isolation - directly in service code. This will require deep experience with distributed systems, debugging complex failure modes across service boundaries, and writing production-quality Go. If you are someone who thrives on fixing hard reliability problems in live systems rather than building greenfield, this role is for you.
WHAT YOU'LL DO

Diagnose and fix stability vulnerabilities in core platform services that cause cascading failures under multi-replica, multi-tenant operation
Implement resilience patterns (leader election, circuit breakers, failure domain isolation) directly in service code
Design multi-replica support for services that currently assume single-instance operation
Collaborate with service owners on contract testing and upgrade validation
Trace cascading failures across service boundaries and drive them to root-cause fixes
Contribute to observability platform improvements to support service stability
Light infrastructure work: Terraform/Kubernetes changes to support service fixes (~20% of time)

REQUIRED QUALIFICATIONS

Production-quality Go - you'll be modifying core platform services, not writing scripts
Practical experience with distributed systems: leader election, consensus, replication, failure modes
Kubernetes - enough to understand how services run (not necessarily cluster administration)
Debugging complex systems - tracing cascading failures across service boundaries
4+ years in SRE, platform engineering, or backend development roles
Must be a U.S. Person due to required access to U.S. export controlled information or facilities

NICE-TO-HAVE QUALIFICATIONS

Rust (some platform services use it)
Experience fixing reliability problems in production services (not just building greenfield)
Familiarity with gRPC service architectures
HashiCorp Consul or similar service discovery/mesh
FedRAMP/IL5 compliance environment experience
ArgoCD / GitOps workflows

US Salary Range

$166,000-$220,000 USD

The salary range for this role is an estimate based on a wide range of compensation factors, inclusive of base salary only. Actual salary offer may vary based on (but not limited to) work experience, education and/or training, critical skills, and/or business considerations. Highly competitive equity grants are included in the majority of full time offers; and are considered part of Anduril's total compensation package. Additionally, Anduril offers top-tier benefits for full-time employees, including:

Benefits

At Anduril, we invest in our people. Our comprehensive, competitive benefits package (available at little to no cost to employees) ensures you're supported in health, recovery, and whatever comes next. For more information, Explore Our Benefits.

About Anduril Industries

Anduril Industries is a defense technology company that develops advanced systems for the military. The company was founded in 2017 by Palmer Luckey, Trae Stephens, and Matt Grimm, and has since grown to become a major player in the defense industry. Anduril's products include autonomous drones, surveillance systems, and other advanced technologies that are designed to enhance military capabilities. The company has received significant funding from investors and has partnerships with several major defense contractors. Anduril is headquartered in Mountain View, California.

Learn more about Anduril Industries

Size

200 employees

Industry

Aerospace & Defense

Founded

2017

* Ladders Estimates

Similar Jobs

Azure Cloud Engineer
$130K — $170K *
LIGHTFEATHER IO LLC
Washington, DC 20011 (District Of Columbia County)
Today
Senior Platform Engineer, Build & Developer Infrastructure
$150K — $200K *
Tower Research Capital, LLC
New York, NY 10025 (New York County)
Yesterday
Senior Engineer
$100K — $215K *
Geico
Bethesda, MD 20817 (Montgomery County)
Yesterday
Software Development Engineer, SRE (US Federal)
$137K — $205K *
Workday
Reston, VA 20191 (Fairfax County)
Yesterday
Sr. Principal Teamcenter Deployment Admin / Container Platform Engineer
$129K — $193K *
Northrop Grumman
Remote
Yesterday
Senior Lead Site Reliability Engineer-Core Engineering Solutions
$150K — $180K *
JP Morgan Chase & Co.
Jersey City, NJ 07310 (Hudson County)
Yesterday

Get Ready For Your
Next Interview

More Jobs at Anduril Industries

Senior Software Engineer, Maritime
$191K — $253K *
Costa Mesa, CA 92627 (Orange County)
Today
Aerospace & Defense
In-Person
Production Supervisor, Missiles
$86K — $114K *
Santa Ana, CA 92704 (Orange County)
Today
Aerospace & Defense
In-Person
Systems Engineer
$146K — $194K *
Irvine, CA 92620 (Orange County)
Reposted Today
Aerospace & Defense
In-Person
Senior Production Software Engineer
$191K — $253K *
Lexington, MA 02421 (Middlesex County)
Today
Aerospace & Defense
In-Person
Senior Product Manager, Manufacturing Software
$166K — $220K *
Costa Mesa, CA 92627 (Orange County)
Today
Manufacturing & Automotive
In-Person

More Enterprise Technology Jobs

Customer Success Manager
$90K — $120K *
Quisitive
Remote
Today
Senior Product Manager, AI Platform
$180K — $220K *
Campminder
Remote
Today
Commercial Counsel
$186K — $224K *
Fivetran
Oakland, CA 94601 (Alameda County)
Today
Technical Solutions Architect, Microsoft
$126K *
Softchoice Corp
Montreal, QC H1A 0A1
Reposted Today
Regional Vice President, Commercial, East | Remote | CA
$150K — $200K *
Grafana Labs
Remote
Today

Find similar Senior Production Engineer jobs:

Nationwide Washington, DC

Senior Production Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Senior Production Engineer jobs:

Get Ready For Your
Next Interview