ROBLOX Corporation

Principal Software Engineer, Compute Fleet Management

ROBLOX Corporation$345K — $399K *
Information Technology
8 - 10 years of experience
Job Overview by Ladders

Qualifications

  • 10+ years of experience in large-scale distributed systems and infrastructure.
  • Demonstrated ability to be a technical anchor for an organization, guiding multiple teams in direction setting.
  • Proficiency in Go with experience designing and operating production services at scale.
  • Hands-on expertise in building Kubernetes-style control planes and reconciliation patterns.
  • Familiarity with gRPC for service APIs and strong knowledge of SQL/Postgres for state management.
  • Experience managing compute capacity across both on-prem and cloud environments.
  • Proven track record of cross-functional collaboration to align compute supply and demand.

Responsibilities

  • Lead the technical direction for three Fleet Management pods, ensuring alignment across provisioning, data planes, and control planes.
  • Architect Kubernetes-style control planes for the compute fleet, focusing on scaling and capacity management.
  • Design internal customer contracts and APIs for automation across the fleet, enabling safe operations.
  • Develop strategies for self-service capacity management through user-facing products and UIs.
  • Enhance security, maintenance, and uptime across Kubernetes clusters, ensuring reliable production changes.
  • Collaborate with various stakeholders to understand compute needs and foster innovation.
  • Engage in daily coding and problem-solving within the systems managed by your organization.

Benefits

  • Equity compensation for all full-time employees.
  • Flexible office attendance policy: onsite three days per week with optional presence on others.
  • Comprehensive benefits package as outlined in company information.
Full Job Description
As a Principal Software Engineer leading Fleet Management, you will be the overall technical lead across three pods and the person who sets the technical direction for the fleet management layer of Roblox. This is a hands-on, deeply technical leadership role that owns all of Roblox's compute capacity end to end: from low-level provisioning and the data plane, up through the control planes that operate it, and all the way to the UI and internal-facing products that let teams self-serve capacity. Your org centralizes security, maintenance operations, and the uptime of every Roblox Kubernetes cluster, and governs the internal customer contracts that drive automation across the fleet spanning Roblox data centers and cloud providers. You will guide architecture, raise the engineering bar, and make sure compute capacity supply and demand stay in balance as the fleet grows. You will: • Serve as the overall technical lead for three Fleet Management pods, setting and aligning the technical direction across low-level provisioning, the data plane, and the control plane and product surfaces above them. • Architect the declarative, Kubernetes-style control planes that operate Roblox's compute fleet across on-prem and cloud, and define how capacity is provisioned, reconciled, and exposed at scale. • Own the design of the internal customer contracts and APIs that govern automation across the fleet, so that every infrastructure team can operate capacity safely and predictably. • Drive the strategy for self-serve capacity, including the internal-facing products and UIs that let teams request, manage, and reason about the compute they depend on. • Centralize and raise the bar on security, maintenance operations, and the uptime of all Roblox Kubernetes clusters, defining how fleet-wide changes ship reliably without impacting production. • Partner broadly with stakeholders inside and outside infrastructure to understand compute needs and drive innovation for our backend services, AI, and edge computing. • Write code daily, staying deep in the systems your org owns and leading by example on the hardest design and implementation problems. You Have: • 10+ years of experience building and operating large-scale distributed systems and infrastructure. • A track record as the technical anchor an organization relies on, with the leadership to set direction across multiple teams and up-level the engineers around you. • Strong proficiency in Go, with deep experience designing and operating production services at fleet scale. • Hands-on experience building declarative, Kubernetes-style control planes and the reconciliation patterns behind them. • Strong proficiency with gRPC for service-to-service APIs and with SQL and Postgres for durable, high-scale state. • Experience operating compute capacity across both on-prem data centers and cloud providers, and a feel for the realities of running fleets at the scale of hundreds of thousands of instances. • A history of being highly cross-functional, partnering with stakeholders across and beyond infrastructure to design systems that keep compute supply and demand in balance. For roles that are based at our headquarters in San Mateo, CA: The starting base pay for this position is as shown below. The actual base pay is dependent upon a variety of job-related factors such as professional background, training, work experience, location, business needs and market demand. Therefore, in some circumstances, the actual salary could fall outside of this expected range. This pay range is subject to change and may be modified in the future. All full-time employees are also eligible for equity compensation and for benefits as described on this page. Annual Salary Range $345,040-$399,420 USD Roles that are based in an office are onsite Tuesday, Wednesday, and Thursday, with optional presence on Monday and Friday (unless otherwise noted).

About ROBLOX Corporation

Roblox Corporation is a video game company that operates a massively multiplayer online game platform. The platform allows users to create and play games in a virtual world, with a focus on user-generated content. Roblox was founded in 2004 and is headquartered in San Mateo, California. The company has grown rapidly in recent years, and now has over 100 million monthly active users. In 2021, Roblox went public through a direct listing on the New York Stock Exchange.
Learn more about ROBLOX Corporation
Size
960 employees
Market Cap
$15.6 billion
Industry
Net Income
-$242.8 million
Founded
2004
Revenue
$727 million
NASDAQ

Similar Jobs

More Jobs at ROBLOX Corporation

More Information Technology Jobs

Find similar Principal Software Engineer, Compute Fleet Management jobs: