Staff Infrastructure Engineer - Observability

SentinelOne • $132K — $215K *

US-AnywhereRemote in United States

Information Technology

8 - 10 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

8+ years in Infrastructure Engineering or Site Reliability Engineering (SRE)
8+ years architecting and managing observability stacks (Prometheus, Grafana, etc.)
Experience with cloud-native infrastructure on AWS or GCP
Advanced proficiency in Infrastructure as Code (IaC) with Terraform and Ansible
Demonstrated experience leading technical design and mentoring engineers
U.S. Citizenship required due to government contract needs

Responsibilities

Architect and implement scalable telemetry platforms for engineering teams
Serve as the SME and administrator for observability infrastructure
Collaborate with diverse teams to define and evolve platform requirements
Take ownership of observability features from design to deployment
Enhance operational efficiency while optimizing cloud costs
Develop automation tools to minimize operational toil
Ensure compliance of observability systems in high-security environments

Benefits

Restricted Stock Units (RSUs)
Employee Stock Purchase Plan (ESPP)
Flexible time off and paid sick time
Gender-neutral parental leave and grandparent leave
Comprehensive medical, dental, and vision coverage
401(k) retirement plan with company match
Wellness programs and reimbursement options for fitness and fertility

Full Job Description

As a Staff Infrastructure Engineer, you'll be a pivotal technical leader and architect within our Observability team, driving strategic initiatives and shaping the future of our critical systems. You will leverage your deep expertise to design, implement, and optimize solutions that underpin SentinelOne's global platform, directly empowering engineering teams across the organization. We are seeking a candidate who is driven by a deep passion for observability and technical leadership. Imagine architecting the core systems that provide SentinelOne with real-time, global visibility, delivering actionable platform insights precisely when they are needed. In this high-impact role, you'll design and implement robust, secure solutions for high-volume data ingestion, storage, and analysis-fundamentally shaping how we understand and optimize our platform health. This is your chance to take end-to-end ownership of critical infrastructure, mentor talented engineers, and profoundly accelerate software delivery across our entire engineering organization. Due to Federal Government contract requirements, U.S. Citizenship is required for this position. FedRAMP staff may be subject to customer or third party background checks up to and including Secret Clearance if required by their role at SentinelOne. What Will You Do? Primary responsibilities include: - Architect and implement robust, scalable telemetry platforms that empower SentinelOne engineers to deploy and monitor features with speed, safety, and reliability. - Act as the primary Subject Matter Expert (SME) and administrator for our core observability stack, including Grafana, Prometheus, Thanos/Mimir/Cortex, and OpenTelemetry (OTEL) pipelines. - Partner strategically with diverse engineering teams across the organization to define platform requirements, ensuring the observability ecosystem evolves ahead of stakeholder needs. - Take complete ownership of critical features, from initial architectural design and requirements refinement through to production deployment and operational maturity. - Drive exemplary operational efficiency for critical observability services across AWS and GCP, meticulously balancing unwavering system reliability with smart cloud cost-optimization. - Build robust automation and self-service tooling to drastically reduce operational toil, optimize resource utilization, and minimize pager fatigue. - Drive the deployment, maintenance, and compliance of observability systems in critical, high-security environments, including FedRAMP and air-gapped deployments. - Cultivate platform transparency and reliability by rigorously implementing IaC (Terraform/Ansible) and standardizing industry best practices. - Elevate engineering quality by mentoring team members, leading comprehensive technical design and code reviews, and providing constructive feedback that fosters growth. - Lead the swift resolution of highly complex production incidents, perform thorough root-cause analyses, and participate in on-call rotations to ensure peak system integrity. What Skills and Knowledge Should You Bring? Ideal candidates will have - 8+ years experience in Infrastructure Engineering, Site Reliability Engineering (SRE), or a related systems-focused field. - 8+ years experience in architecting, scaling, and managing enterprise-grade observability stacks utilizing Prometheus, Grafana, Thanos (or Mimir/Cortex), and OpenTelemetry (OTEL). - Experience design-engineering cloud-native infrastructure within major cloud providers (AWS or GCP) and managing production Kubernetes environments (EKS, GKE). - Advanced proficiency with IaC and automation tools, specifically Terraform and Ansible, to manage immutable infrastructure. - Experience maintaining and optimizing high-throughput, large-scale distributed systems with a focus on cost-efficiency, scalability, and disaster recovery. - Demonstrated ability to lead complex technical designs, mentor other engineers, and collaborate cross-functionally with product and application teams. - US Citizenship and the ability to work in a government-regulated environment. Preferred Qualifications - 8+ years production-level programming experience in GoLang (highly desirable) or another mainstream language (e.g., Python, Java) with a strong willingness to adopt GoLang. - Experience working with high-security compliance frameworks, specifically FedRAMP or other sovereign cloud requirements. - Familiarity with the unique operational challenges of on-premises, hybrid, or air-gapped Kubernetes deployments. - Experience designing advanced CI/CD pipelines (e.g., GitHub Actions) and implementing sophisticated deployment strategies (canary, blue-green, rolling updates). We invest in our Sentinels with comprehensive, competitive benefits designed to support you and your family: Equity & Rewards - Restricted Stock Units (RSUs) - Employee Stock Purchase Plan (ESPP) Time Off & Wellbeing - Flexible time off - Paid company holidays and paid sick time - Gender-neutral parental leave - Grandparent leave Insurance & Financial Security - Medical, dental, and vision coverage - 401(k) retirement plan with company match - Life and disability insurance - Health and dependent care FSA - Voluntary benefits (hospital, accident, critical illness) - Employee Assistance Program (EAP) - ARAG pre-paid legal - Nationwide pet insurance - Cancer Care program - Global business travel medical insurance Work Perks & Flexibility - Home office allowance - Mobile phone reimbursement Wellness & Lifestyle - Wellness coach - Wellness/gym reimbursement - Fertility coverage - Adoption & surrogacy reimbursement This U.S. role has a base pay range that will vary based on the location of the candidate. For some locations, a different pay range may apply. If so, this range will be provided to you during the recruiting process. You can also reach out to the recruiter with any questions. Base Salary Range $132,000-$215,000 USD

About SentinelOne

SentinelOne is a cybersecurity company that provides endpoint security solutions to protect businesses from cyber threats. The company's platform uses artificial intelligence and machine learning to detect and respond to threats in real-time. SentinelOne serves clients in a variety of industries, including healthcare, finance, and government. The company was founded in 2013 and is headquartered in Palo Alto, California.

Learn more about SentinelOne

Size

1,000 employees

Industry

Enterprise Technology

Founded

2013

* Ladders Estimates

Similar Jobs

Tactical Fires Network Systems Analyst
$61K — $141K *
TeleTech
Camp Pendleton, CA 92055 (San Diego County)
Today
FAC Lead Storage Architect
$130K — $180K *
University of California San Francisco
San Francisco, CA 94112 (San Francisco County)
Today
Staff Engineer - Capacity Planning and Management
$110K — $230K *
Geico
Seattle, WA 98115 (King County)
Reposted Today
Staff, Cloud Infrastructure Engineer - 220363
$132K — $199K *
Hadapt.com
San Diego, CA 92154 (San Diego County)
Today
Enterprise Applications Engineer
$135K — $160K *
Presidio Trust
San Francisco, CA 94112 (San Francisco County)
Today
Systems Architect
$120K — $150K *
Bespoke Technologies, Inc
Chantilly, VA 20152 (Loudoun County)
Today

Get Ready For Your
Next Interview

More Jobs at SentinelOne

Staff Infrastructure Engineer - Observability
$132K — $215K *
Remote
Today
Information Technology
Remote in United States
Social Media and Motion Designer
$108K — $149K *
Remote
Today
Media
Remote in United States
Sr. Director, Enterprise AI Platform Engineering
$198K — $298K *
Remote
2 days ago
Enterprise Technology
Remote in United States
IT Data Center Engineer
$84K — $116K *
Hillsboro, OR 97124 (Washington County)
1 week ago
Information Technology
In-Person
Staff Project Manager, IT Enterprise Applications
$132K — $182K *
Remote
2 weeks ago
Enterprise Technology
Remote in United States

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
2 weeks ago
Principal System Engineering
$155K — $261K *
AT&T
Dallas, TX 75217 (Dallas County)
Today
Principal Data/AI Engineering
$155K — $261K *
AT&T
Alpharetta, GA 30022 (Fulton County)
Today
Associate Director-Technology, AI Native Software
$143K — $215K *
AT&T
Dallas, TX 75217 (Dallas County)
Today
Director, Technology – AI Engineering Transformation
$210K — $316K *
AT&T
Dallas, TX 75217 (Dallas County)
Today

Find similar Staff Infrastructure Engineer - Observability jobs:

Nationwide Remote

Staff Infrastructure Engineer - Observability

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Staff Infrastructure Engineer - Observability jobs:

Get Ready For Your
Next Interview