Site Reliability Engineer - HPC & Automation (Silicon Engineering)

SpaceX • $125K — $175K *

Redmond, WA 98052In-Person

Technical Services

Less than 5 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

Bachelor's degree in computer science, information systems, or engineering; OR 2+ years in system administration or site reliability engineering.
1+ years of development experience in Bash, Python, or similar languages.
1+ years of experience with Linux operating systems.

Responsibilities

Deploy, upgrade, operate, and scale clusters and services.
Collaborate with engineers to create automated solutions for silicon simulation workflows.
Manage infrastructure as code and use observability tools to monitor health.
Operate continuous integration pipelines and version control systems.
Identify and resolve performance bottlenecks creatively.

Benefits

Comprehensive medical, vision, and dental coverage.
401(k) retirement plan and short/long-term disability insurance.
Paid parental leave and paid vacation (3 weeks) with 10+ holidays.
Access to employee stock purchase program with discounts.
Company shuttles for commuting to the office.

Full Job Description

SITE RELIABILITY ENGINEER - HPC & AUTOMATION (SILICON ENGINEERING)

At SpaceX we're leveraging our experience in building rockets and spacecraft to deploy Starlink, the world's most advanced broadband internet system. Starlink is the world's largest satellite constellation and is providing fast, reliable internet to millions of users worldwide. We design, build, test, and operate all parts of the system - thousands of satellites, consumer receivers that allow users to connect within minutes of unboxing, and the software that brings it all together. We've only begun to scratch the surface of Starlink's potential global impact and are looking for best-in-class engineers to help maximize Starlink's utility for communities and businesses around the globe.

We are seeking a motivated, proactive, and intellectually curious engineer who will work alongside world-class cross-disciplinary teams (systems, firmware, architecture, design, validation, product engineering, ASIC implementation). As a Site Reliability Engineer on the Silicon Engineering team you will get the opportunity to design, operate, scale, and automate the high performance computing infrastructure we use to develop the chips powering the world's largest satellite constellation and a global internet service. This position will have a meaningful impact on Starlink silicon by enabling faster design-iterations, simulations, and regression turnaround times that gate how fast our chip teams can ship.

RESPONSIBILITIES:

Deploy, upgrade, operate, maintain, and scale our suite of clusters and services
Collaborate with engineers to develop automated, full turnkey solutions for silicon simulation workflows to speed up project timelines
Manage our underlying infrastructure as code and use modern observability tools to provide a complete picture of cluster and infrastructure health
Operate the continuous integration pipeline, build and release systems, and version control across the environment
Identify and eliminate performance bottlenecks using measurement and creative engineering

BASIC QUALIFICATIONS:

Bachelor's degree in computer science, information systems, or an engineering discipline; OR 2+ years of professional experience in system administration, high performance computing, or site reliability engineering
1+ years of development experience with Bash, Python, and/or other programming languages
1+ years of experience with Linux operating systems

PREFERRED SKILLS AND EXPERIENCE:

Familiarity with containerization technologies (i.e. Docker, Kubernetes)
Knowledge in computer system concepts (computer architecture, computer organization, operating systems and concurrency)
Experience with databases and data modeling (e.g., MySQL, PostgreSQL, SQLite)
Networking knowledge of TCP/IP
Experience with high performance computing and workload managers (e.g., Slurm, LSF)
Experience with Terraform, Ansible, Puppet, or similar automation frameworks
Experience building monitoring and alerting as code (e.g., Grafana, Prometheus, custom exporters)
Experience with CI/CD automation at scale (e.g., Jenkins, Bamboo, build systems)
Experience with infrastructure as code (IaC) tools for managing fleets of servers
Experience with using & building REST API clients/servers
Experience with enterprise/networked storage automation (e.g., NetApp ONTAP REST API/CLI, NFS)
Experience with ASIC design flows and tools (e.g., Cadence, Synopsys, Ansys, Keysight, Siemens)
Strong desire to find performance bottlenecks and performance improvement techniques
Excellent communication skills with the ability to communicate with customers, peers, management, etc. in both formal and informal situations
Ability to quickly learn new tools and frameworks
Interest in or experience with AI/LLM-assisted tooling (e.g., Grok, Claude Code)

ADDITIONAL REQUIREMENTS:

Ability to work extended hours and weekends as needed to meet critical milestones

COMPENSATION AND BENEFITS:

Pay Range:
Level 1: $125,000.00 - $150,000.00
Level 2: $145,000.00 - $175,000.00

Your actual level and base salary will be determined on a case-by-case basis and may vary based on the following considerations: job-related knowledge and skills, education, and experience.

Base salary is just one part of your total rewards package at SpaceX. You may also be eligible for long-term incentives, in the form of company stock or long-term cash awards, as well as potential discretionary bonuses and the ability to purchase additional stock at a discount through an Employee Stock Purchase Plan. You will also receive access to comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short and long-term disability insurance, life insurance, paid parental leave, and various other discounts and perks. You may also accrue 3 weeks of paid vacation and will be eligible for 10 or more paid holidays per year. Employees in Washington State accrue paid sick time in compliance with state and federal law. Company shuttles are offered to employees for roundtrip travel from select Seattle locations to the SpaceX Redmond office Monday to Friday.

About SpaceX

SpaceX is an American aerospace manufacturer and space transportation services company founded in 2002 by entrepreneur Elon Musk. The company designs, manufactures, and launches advanced rockets and spacecraft. SpaceX has developed the Falcon 1, Falcon 9, Falcon Heavy, and Dragon spacecraft. The company was founded with the goal of reducing space transportation costs and enabling the colonization of Mars. SpaceX has achieved several milestones in spaceflight, including the first privately-funded liquid-propellant rocket to reach orbit, the first privately-funded company to send a spacecraft to the International Space Station, and the first privately-funded company to send a human-rated spacecraft to orbit.

Learn more about SpaceX

Size

8,000 employees

Industry

Aerospace & Defense

Founded

2002

* Ladders Estimates

Similar Jobs

Senior Loads & Dynamics Engineer - Blue Ring
$145K — $203K *
Blue Origin
Seattle, WA 98115 (King County)
Today
Senior Engineer - Integrated Plant Design (Remote Eligible, U.S)
$111K — $213K *
GE Vernova
Remote
Reposted Today
Systems Integration Engineer, Level 4 - Lunar Permanence Program
$145K — $203K *
Blue Origin
Seattle, WA 98115 (King County)
Today
Senior Site Reliability Engineer
$100K — $125K *
IDEXX
Remote
Today
Senior Agentic DevOps Engineer - Remote - USA
$120K — $150K *
FullStack Labs
Remote
Reposted Today
Electronic Systems & Hardware Design Analysis Engineer 5
$120K — $150K *
Indotronix International Corporation
Everett, WA 98208 (Snohomish County)
Reposted Today

Get Ready For Your
Next Interview

More Jobs at SpaceX

Site Reliability Engineer - HPC & Automation (Silicon Engineering)
$125K — $175K *
Redmond, WA 98052 (King County)
Today
Technical Services
In-Person
Sr. Financial Systems Analyst (D365)
$120K — $160K *
Hawthorne, CA 90250 (Los Angeles County)
Today
Finance & Insurance
In-Person
Lead Mechanical Engineer, Hardware Reliability (Starlink)
$120K — $150K *
Bastrop, TX 78602 (Bastrop County)
Today
Aerospace & Defense
In-Person
Engineering Technician - 2nd Shift
$70K — $106K *
Redmond, WA 98052 (King County)
Today
Manufacturing & Automotive
In-Person
Sr. Supply Chain Planner, Raw Materials (Raptor)
$90K — $115K *
Hawthorne, CA 90250 (Los Angeles County)
Reposted Today
Aerospace & Defense
In-Person

More Technical Services Jobs

General Manager
$100K — $200K + 30% bonus *
Lunova Group
Memphis, TN 38101 (Shelby County)
Reposted 3 days ago
Advanced Technology Engineer
$127K — $159K *
Brookhaven National Laboratory
Upton, NY 11973 (Suffolk County)
Today
Robotics Automation Project Engineer
$90K — $120K *
The Home Depot
Remote
Today
Advanced HVAC Service Technician
$62K — $104K *
Ascension Property Services
Nashville, TN 37211 (Davidson County)
Today
Sr. Consultant
$90K — $130K *
NewGen
Remote
Reposted Today

Find similar Site Reliability Engineer - HPC & Automation (Silicon Engineering) jobs:

Nationwide Redmond, WA

Site Reliability Engineer - HPC & Automation (Silicon Engineering)

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Site Reliability Engineer - HPC & Automation (Silicon Engineering) jobs:

Get Ready For Your
Next Interview