ROBLOX Corporation

Senior Site Reliability Engineer, Compute

ROBLOX Corporation$243K — $295K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's degree in Computer Science or related field; equivalent experience considered.
  • 6+ years experience as a Site Reliability Engineer or Software Engineer.
  • Fluency in high-level programming languages such as Go, Java, or C#.
  • Experience with Kubernetes or similar orchestration systems; knowledge of Nomad, Vault, and Consul is a plus.
  • Proven ability to build reliable software and tools that gain adoption.

Responsibilities

  • Design and develop systems for fault-tolerance and resilience.
  • Promote reliability best practices and lead initiatives within the Infra Compute group.
  • Build, automate, and standardize tooling for a seamless development process.
  • Create tooling to evaluate production readiness with load testing.
  • Establish performance monitoring and observability services for capacity and degradation insights.
  • Analyze system designs for production readiness.

Benefits

  • Eligible for equity compensation.
  • Health, dental, and vision insurance coverage.
  • Generous paid time off and holiday policy.
  • Flexible work arrangement with hybrid on-site/remote options.
  • Continuous learning and professional development opportunities.
Full Job Description
The Infrastructure Compute Site Reliability Engineering mission is to own and manage the successful operation of our underlying cell infrastructure system, along with elements of service discovery, secrets management and related software layers. We're looking for a skilled Senior Site Reliability Engineer with strong programming skills to help us build Roblox's private cloud, productionize our growing Kubernetes-based infrastructure, and institute reliability best practices across the Roblox Compute team.
You will:
  • Design and Develop systems & libraries that promote fault-tolerance and resilience, automate much of the management and lifecycle of our clusters, and ensure systems are observable.
  • Promote and Institute reliability best practices across the Infra Compute group, drive common reliability initiatives. Provides collaborative technical reviews and operational guidance to strengthen system reliability.
  • Build, Automate and Standardize process automation to create a "golden path" of tooling and platform support that powers the fundamental Roblox ecosystem.
  • Create Tooling that provides production guardrails, by evaluating release candidate capacity with load testing tooling before deploying to production.
  • Create Performance Monitoring Services and observability towards understanding capacity issues and platform degradations, monitoring production services and their changes, like generalized canarying services with alerting.
  • Analyze systems and system designs for production readiness
You have:
  • A Bachelor degree (or equivalent professional experience) in Computer Science or related engineering field with a proven track record including at least 6 years as an SRE or Software Engineer.
  • Fluency with high-level programming languages like Go, Java, C#.
  • Experience with Kubernetes, or similar orchestration systems. Experience in Nomad, Vault, and Consul is strongly desired.
  • Experience and good habits around building software and tools and getting them adopted. Your system's focus advises a view of code needing to be deeply reliable.
You are:
  • A Partner: You know that the best tools integrate broadly with the tooling ecosystem. You approach partners and processes with curiosity and seek to understand a problem deeply before you start coding.
  • A Developer: You love building durable and reliable complex systems.
  • Passionate about problem-solving, finding creative work solutions, and addressing unexpected challenges as part of a team.
  • Problem Solver: You ask the right questions to tackle issues within your expertise and you use data to test your theories.
  • Planner: You have experience in large project lifecycles. You have experience working in sprints, breaking down complex tasks into achievements, and reporting status to keep project scheduling accurate.


For roles that are based at our headquarters in San Mateo, CA: The starting base pay for this position is as shown below. The actual base pay is dependent upon a variety of job-related factors such as professional background, training, work experience, location, business needs and market demand. Therefore, in some circumstances, the actual salary could fall outside of this expected range. This pay range is subject to change and may be modified in the future. All full-time employees are also eligible for equity compensation and for benefits as described on this page.

Annual Salary Range

$243,290-$295,250 USD

Roles that are based in an office are onsite Tuesday, Wednesday, and Thursday, with optional presence on Monday and Friday (unless otherwise noted).

About ROBLOX Corporation

Roblox Corporation is a video game company that operates a massively multiplayer online game platform. The platform allows users to create and play games in a virtual world, with a focus on user-generated content. Roblox was founded in 2004 and is headquartered in San Mateo, California. The company has grown rapidly in recent years, and now has over 100 million monthly active users. In 2021, Roblox went public through a direct listing on the New York Stock Exchange.
Learn more about ROBLOX Corporation
Size
960 employees
Market Cap
$15.6 billion
Industry
Net Income
-$242.8 million
Founded
2004
Revenue
$727 million
NASDAQ

Similar Jobs

More Jobs at ROBLOX Corporation

More Information Technology Jobs

Find similar Senior Site Reliability Engineer, Compute jobs: