Senior Network & Site Reliability Engineer

Alembic

• $130K — $180K *

San Francisco, CA 94112In-Person

Information Technology

8 - 10 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

8+ years in network or infrastructure engineering, with 5+ years in datacenter operations and/or systems and network administration.
Strong background in network security, architecture, design, and operations.
Hands-on experience with network devices (firewalls, switches, load balancers) and large-scale architectures and protocols.
Experience designing and operating modern datacenter network fabrics (spine-leaf, EVPN/VXLAN, ECMP).
Proficient in network automation and Infrastructure as Code (IaC) tooling (Ansible, Terraform, etc.).
Familiarity with Kubernetes networking and strong operational experience with Linux-based infrastructures.
Solid scripting skills (Python, Bash) for debugging and automation.

Responsibilities

Architect and operate scalable, secure network architecture for high-security requirements.
Own network device configuration management end to end for reliability.
Enhance system and network reliability through automation and proactive planning.
Implement and manage complex network protocols and external connectivity.
Establish monitoring, alerting, and incident response for performance optimization.
Ensure security, compliance, and operational readiness in network and cloud infrastructure.
Collaborate across teams to foster a culture of performance and reliability.

Benefits

Opportunity to work with one of the world’s fastest private supercomputers.
Influence architecture decisions in a mission-critical environment.
Challenging technical problems focused on real-world scalability and reliability.
Autonomy in ownership of infrastructure rather than traditional maintenance roles.

Full Job Description

About the Role

We're building infrastructure that has to perform under real-world scale, reliability, and security demands - and we're looking for an engineer who wants to own the foundation it runs on. This isn't a traditional "keep the lights on" role.

You'll design and operate the global network and reliability layer behind one of the world's fastest private supercomputers - the fabric powering distributed compute, ML workloads, real-time analytics, and mission-critical enterprise systems. You'll work across networking, systems, automation, observability, and reliability engineering to scale a platform where performance genuinely matters, with real influence over architecture decisions.

It's a strong fit if you like solving deep infrastructure problems, building resilient systems, automating everything repetitive, and owning architecture rather than just maintaining it.

What You'll Do

Architect and operate scalable, secure network architecture for high-security requirements and large-scale machine learning workloads.
Own network device configuration management end to end, ensuring consistency and reliability across the fleet.
Improve system and network reliability and performance through automation, observability, and proactive capacity planning.
Implement and manage complex network protocols and connectivity, including BGP, VPNs, and WAN circuits and external peering.
Build and maintain comprehensive monitoring, alerting, and incident response - SLOs, runbooks, and on-call rotations - and drive post-incident analysis and continuous improvement.
Ensure security, compliance, and operational readiness across our network and cloud infrastructure.
Partner across engineering and data science to drive a culture of performance and reliability.

What Will Help You Succeed

8+ years in network or infrastructure engineering, including 5+ years in datacenter operations and/or systems and network administration.
A strong background in network security, architecture, design, and operations.
Extensive hands-on experience with network devices (firewalls, switches, load balancers) and large-scale architectures and protocols - BGP, QoS, MPLS, and IPsec VPNs.
Experience designing and operating modern datacenter network fabrics (spine-leaf, EVPN/VXLAN, ECMP).
Network automation and IaC tooling (Ansible, Terraform, Nornir, or similar), plus IPAM/DCIM platforms (NetBox, Infoblox, or similar).
WAN engineering - carrier circuit provisioning and external network peering.
Familiarity with Kubernetes networking (CNI plugins, ingress, service networking, network policy) and strong operational experience with Linux-based production infrastructure.
Experience with monitoring and observability stacks (Prometheus, Grafana, Datadog, ELK, OpenTelemetry).
Solid scripting (Python, Bash) to debug complex network and system issues and automate solutions, plus excellent cross-functional communication.

Also Helpful

NVIDIA networking technologies - Cumulus Linux, InfiniBand, Spectrum-X, and BlueField DPUs (this is the fabric behind our SuperPOD).
Familiarity with data-intensive platforms (Spark, Airflow, Kafka) and storage network protocols (NFS, LustreFS, iSCSI).
Security practices for applications and infrastructure, and experience in high-compliance or SOC 2 environments.

The Role Is Right for You If

You want to own mission-critical network and infrastructure end to end - from architecture to incident management - not just keep it running.
You'd rather build and automate than direct from a distance, and you want meaningful influence over how a high-performance platform scales.

* Ladders Estimates

Similar Jobs

Senior Associate - Network Engineer
$111K — $159K *
New York Life Insurance Co
Remote
Today
Network Engineering Director
$169K — $355K *
Oracle Corporation
San Jose, CA 95123 (Santa Clara County)
Today
Senior Network Analyst
$132K — $172K *
Santa Clara Valley Transportation Authority
San Jose, CA 95134 (Santa Clara County)
Reposted Today
Sr. Engineer - Core Network
$100K — $130K *
Lumos
Remote
Today
Network Engineer (Juniper Routers)
$92K — $187K *
Hewlett Packard Enterprise Development LP
Sunnyvale, CA 94087 (Santa Clara County)
Today
Senior Network Load Balancing Architect
$154K — $215K *
Ascension
Remote
Reposted 2 days ago

Get Ready For Your
Next Interview

More Jobs at Alembic

Senior Network & Site Reliability Engineer
$130K — $180K *
San Francisco, CA 94112 (San Francisco County)
Today
Information Technology
In-Person

More Information Technology Jobs

Client Partner - Banking / Financial Services / Capital Markets
$325K — $350K + $100K bonus *
Large IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
1 week ago
UX Architect/Lead
$130K — $200K *
HP Development Company, L.P.
Washington, DC 20011 (District Of Columbia County)
Reposted Today
Software Engineer III
$90K — $180K *
Walmart, Inc.
Bentonville, AR 72712 (Benton County)
Reposted Today
Site Reliability Engineer
$90K — $120K *
Tecsys
Montreal, QC H1A 0A1
Reposted Today
Client Onboarding Manager
$75K — $95K *
Global Data Consultants
Lafayette, LA 70506 (Lafayette County)
Reposted Today

Find similar Senior Network & Site Reliability Engineer jobs:

Nationwide San Francisco, CA

Senior Network & Site Reliability Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Senior Network & Site Reliability Engineer jobs:

Get Ready For Your
Next Interview