SentinelOne

Staff Infrastructure Engineer - Observability

SentinelOne$132K — $215K *
US-AnywhereRemote in United States
Information Technology
8 - 10 years of experience
Job Overview by Ladders

Qualifications

  • 8+ years in Infrastructure Engineering or Site Reliability Engineering (SRE)
  • 8+ years architecting and managing observability stacks (Prometheus, Grafana, etc.)
  • Experience with cloud-native infrastructure on AWS or GCP
  • Advanced proficiency in Infrastructure as Code (IaC) with Terraform and Ansible
  • Demonstrated experience leading technical design and mentoring engineers
  • U.S. Citizenship required due to government contract needs

Responsibilities

  • Architect and implement scalable telemetry platforms for engineering teams
  • Serve as the SME and administrator for observability infrastructure
  • Collaborate with diverse teams to define and evolve platform requirements
  • Take ownership of observability features from design to deployment
  • Enhance operational efficiency while optimizing cloud costs
  • Develop automation tools to minimize operational toil
  • Ensure compliance of observability systems in high-security environments

Benefits

  • Restricted Stock Units (RSUs)
  • Employee Stock Purchase Plan (ESPP)
  • Flexible time off and paid sick time
  • Gender-neutral parental leave and grandparent leave
  • Comprehensive medical, dental, and vision coverage
  • 401(k) retirement plan with company match
  • Wellness programs and reimbursement options for fitness and fertility
Full Job Description
As a Staff Infrastructure Engineer, you'll be a pivotal technical leader and architect within our Observability team, driving strategic initiatives and shaping the future of our critical systems. You will leverage your deep expertise to design, implement, and optimize solutions that underpin SentinelOne's global platform, directly empowering engineering teams across the organization. We are seeking a candidate who is driven by a deep passion for observability and technical leadership. Imagine architecting the core systems that provide SentinelOne with real-time, global visibility, delivering actionable platform insights precisely when they are needed. In this high-impact role, you'll design and implement robust, secure solutions for high-volume data ingestion, storage, and analysis-fundamentally shaping how we understand and optimize our platform health. This is your chance to take end-to-end ownership of critical infrastructure, mentor talented engineers, and profoundly accelerate software delivery across our entire engineering organization. Due to Federal Government contract requirements, U.S. Citizenship is required for this position. FedRAMP staff may be subject to customer or third party background checks up to and including Secret Clearance if required by their role at SentinelOne. What Will You Do? Primary responsibilities include: - Architect and implement robust, scalable telemetry platforms that empower SentinelOne engineers to deploy and monitor features with speed, safety, and reliability. - Act as the primary Subject Matter Expert (SME) and administrator for our core observability stack, including Grafana, Prometheus, Thanos/Mimir/Cortex, and OpenTelemetry (OTEL) pipelines. - Partner strategically with diverse engineering teams across the organization to define platform requirements, ensuring the observability ecosystem evolves ahead of stakeholder needs. - Take complete ownership of critical features, from initial architectural design and requirements refinement through to production deployment and operational maturity. - Drive exemplary operational efficiency for critical observability services across AWS and GCP, meticulously balancing unwavering system reliability with smart cloud cost-optimization. - Build robust automation and self-service tooling to drastically reduce operational toil, optimize resource utilization, and minimize pager fatigue. - Drive the deployment, maintenance, and compliance of observability systems in critical, high-security environments, including FedRAMP and air-gapped deployments. - Cultivate platform transparency and reliability by rigorously implementing IaC (Terraform/Ansible) and standardizing industry best practices. - Elevate engineering quality by mentoring team members, leading comprehensive technical design and code reviews, and providing constructive feedback that fosters growth. - Lead the swift resolution of highly complex production incidents, perform thorough root-cause analyses, and participate in on-call rotations to ensure peak system integrity. What Skills and Knowledge Should You Bring? Ideal candidates will have - 8+ years experience in Infrastructure Engineering, Site Reliability Engineering (SRE), or a related systems-focused field. - 8+ years experience in architecting, scaling, and managing enterprise-grade observability stacks utilizing Prometheus, Grafana, Thanos (or Mimir/Cortex), and OpenTelemetry (OTEL). - Experience design-engineering cloud-native infrastructure within major cloud providers (AWS or GCP) and managing production Kubernetes environments (EKS, GKE). - Advanced proficiency with IaC and automation tools, specifically Terraform and Ansible, to manage immutable infrastructure. - Experience maintaining and optimizing high-throughput, large-scale distributed systems with a focus on cost-efficiency, scalability, and disaster recovery. - Demonstrated ability to lead complex technical designs, mentor other engineers, and collaborate cross-functionally with product and application teams. - US Citizenship and the ability to work in a government-regulated environment. Preferred Qualifications - 8+ years production-level programming experience in GoLang (highly desirable) or another mainstream language (e.g., Python, Java) with a strong willingness to adopt GoLang. - Experience working with high-security compliance frameworks, specifically FedRAMP or other sovereign cloud requirements. - Familiarity with the unique operational challenges of on-premises, hybrid, or air-gapped Kubernetes deployments. - Experience designing advanced CI/CD pipelines (e.g., GitHub Actions) and implementing sophisticated deployment strategies (canary, blue-green, rolling updates). We invest in our Sentinels with comprehensive, competitive benefits designed to support you and your family: Equity & Rewards - Restricted Stock Units (RSUs) - Employee Stock Purchase Plan (ESPP) Time Off & Wellbeing - Flexible time off - Paid company holidays and paid sick time - Gender-neutral parental leave - Grandparent leave Insurance & Financial Security - Medical, dental, and vision coverage - 401(k) retirement plan with company match - Life and disability insurance - Health and dependent care FSA - Voluntary benefits (hospital, accident, critical illness) - Employee Assistance Program (EAP) - ARAG pre-paid legal - Nationwide pet insurance - Cancer Care program - Global business travel medical insurance Work Perks & Flexibility - Home office allowance - Mobile phone reimbursement Wellness & Lifestyle - Wellness coach - Wellness/gym reimbursement - Fertility coverage - Adoption & surrogacy reimbursement This U.S. role has a base pay range that will vary based on the location of the candidate. For some locations, a different pay range may apply. If so, this range will be provided to you during the recruiting process. You can also reach out to the recruiter with any questions. Base Salary Range $132,000-$215,000 USD

About SentinelOne

SentinelOne is a cybersecurity company that provides endpoint security solutions to protect businesses from cyber threats. The company's platform uses artificial intelligence and machine learning to detect and respond to threats in real-time. SentinelOne serves clients in a variety of industries, including healthcare, finance, and government. The company was founded in 2013 and is headquartered in Palo Alto, California.
Learn more about SentinelOne
Size
1,000 employees
Industry
Founded
2013

Similar Jobs

More Jobs at SentinelOne

More Information Technology Jobs

Find similar Staff Infrastructure Engineer - Observability jobs: