Principal Software Engineer, Compute Fleet Management

ROBLOX Corporation • $345K — $399K *

San Mateo, CA 94403In-Person

Information Technology

8 - 10 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

10+ years of experience in large-scale distributed systems and infrastructure.
Demonstrated ability to be a technical anchor for an organization, guiding multiple teams in direction setting.
Proficiency in Go with experience designing and operating production services at scale.
Hands-on expertise in building Kubernetes-style control planes and reconciliation patterns.
Familiarity with gRPC for service APIs and strong knowledge of SQL/Postgres for state management.
Experience managing compute capacity across both on-prem and cloud environments.
Proven track record of cross-functional collaboration to align compute supply and demand.

Responsibilities

Lead the technical direction for three Fleet Management pods, ensuring alignment across provisioning, data planes, and control planes.
Architect Kubernetes-style control planes for the compute fleet, focusing on scaling and capacity management.
Design internal customer contracts and APIs for automation across the fleet, enabling safe operations.
Develop strategies for self-service capacity management through user-facing products and UIs.
Enhance security, maintenance, and uptime across Kubernetes clusters, ensuring reliable production changes.
Collaborate with various stakeholders to understand compute needs and foster innovation.
Engage in daily coding and problem-solving within the systems managed by your organization.

Benefits

Equity compensation for all full-time employees.
Flexible office attendance policy: onsite three days per week with optional presence on others.
Comprehensive benefits package as outlined in company information.

Full Job Description

As a Principal Software Engineer leading Fleet Management, you will be the overall technical lead across three pods and the person who sets the technical direction for the fleet management layer of Roblox. This is a hands-on, deeply technical leadership role that owns all of Roblox's compute capacity end to end: from low-level provisioning and the data plane, up through the control planes that operate it, and all the way to the UI and internal-facing products that let teams self-serve capacity. Your org centralizes security, maintenance operations, and the uptime of every Roblox Kubernetes cluster, and governs the internal customer contracts that drive automation across the fleet spanning Roblox data centers and cloud providers. You will guide architecture, raise the engineering bar, and make sure compute capacity supply and demand stay in balance as the fleet grows. You will: • Serve as the overall technical lead for three Fleet Management pods, setting and aligning the technical direction across low-level provisioning, the data plane, and the control plane and product surfaces above them. • Architect the declarative, Kubernetes-style control planes that operate Roblox's compute fleet across on-prem and cloud, and define how capacity is provisioned, reconciled, and exposed at scale. • Own the design of the internal customer contracts and APIs that govern automation across the fleet, so that every infrastructure team can operate capacity safely and predictably. • Drive the strategy for self-serve capacity, including the internal-facing products and UIs that let teams request, manage, and reason about the compute they depend on. • Centralize and raise the bar on security, maintenance operations, and the uptime of all Roblox Kubernetes clusters, defining how fleet-wide changes ship reliably without impacting production. • Partner broadly with stakeholders inside and outside infrastructure to understand compute needs and drive innovation for our backend services, AI, and edge computing. • Write code daily, staying deep in the systems your org owns and leading by example on the hardest design and implementation problems. You Have: • 10+ years of experience building and operating large-scale distributed systems and infrastructure. • A track record as the technical anchor an organization relies on, with the leadership to set direction across multiple teams and up-level the engineers around you. • Strong proficiency in Go, with deep experience designing and operating production services at fleet scale. • Hands-on experience building declarative, Kubernetes-style control planes and the reconciliation patterns behind them. • Strong proficiency with gRPC for service-to-service APIs and with SQL and Postgres for durable, high-scale state. • Experience operating compute capacity across both on-prem data centers and cloud providers, and a feel for the realities of running fleets at the scale of hundreds of thousands of instances. • A history of being highly cross-functional, partnering with stakeholders across and beyond infrastructure to design systems that keep compute supply and demand in balance. For roles that are based at our headquarters in San Mateo, CA: The starting base pay for this position is as shown below. The actual base pay is dependent upon a variety of job-related factors such as professional background, training, work experience, location, business needs and market demand. Therefore, in some circumstances, the actual salary could fall outside of this expected range. This pay range is subject to change and may be modified in the future. All full-time employees are also eligible for equity compensation and for benefits as described on this page. Annual Salary Range $345,040-$399,420 USD Roles that are based in an office are onsite Tuesday, Wednesday, and Thursday, with optional presence on Monday and Friday (unless otherwise noted).

About ROBLOX Corporation

Roblox Corporation is a video game company that operates a massively multiplayer online game platform. The platform allows users to create and play games in a virtual world, with a focus on user-generated content. Roblox was founded in 2004 and is headquartered in San Mateo, California. The company has grown rapidly in recent years, and now has over 100 million monthly active users. In 2021, Roblox went public through a direct listing on the New York Stock Exchange.

Learn more about ROBLOX Corporation

Size

960 employees

Market Cap

$15.6 billion

Industry

Retail & Consumer Goods

Net Income

-$242.8 million

Founded

2004

Revenue

$727 million

NASDAQ

RBLX

* Ladders Estimates

Similar Jobs

Principal Software Engineer, Fintech Risk Platform
$261K — $353K *
Intuit Inc
Mountain View, CA 94040 (Santa Clara County)
1 month ago

Get Ready For Your
Next Interview

More Jobs at ROBLOX Corporation

Senior Software Engineer, Content Suitability
$196K — $243K *
San Mateo, CA 94403 (San Mateo County)
Today
Information Technology
In-Person
Senior Software Engineer - Safety Experience
$243K — $295K *
San Mateo, CA 94403 (San Mateo County)
Today
Information Technology
In-Person
Principal Software Engineer, Compute Fleet Management
$345K — $399K *
San Mateo, CA 94403 (San Mateo County)
Today
Information Technology
In-Person
Senior Product Manager, UI Ecosystems
$229K — $280K *
San Mateo, CA 94403 (San Mateo County)
Reposted Yesterday
Consumer Technology
In-Person
Principal Software Engineer, GPU Compute
$345K — $399K *
San Mateo, CA 94403 (San Mateo County)
3 days ago
Information Technology
In-Person

More Information Technology Jobs

Client Partner - Banking / Financial Services / Capital Markets
$325K — $350K + $100K bonus *
Large IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
6 days ago
Business Development Director
$300K — $345K + $120K bonus *
Tier1 IT Services Firm
Kansas City, MO 64116 (Clay County)
1 week ago
Client Partner / Business Developemnt - Banking
$250K — $320K + $70K bonus *
IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
1 week ago
Senior Technical Analyst, Networks
$91K *
BC Ferries
Victoria, BC V8N 6N8
Today
Data Governance Analyst
$104K — $122K *
BRMi
Remote
Today

Find similar Principal Software Engineer, Compute Fleet Management jobs:

Nationwide San Mateo, CA

Principal Software Engineer, Compute Fleet Management

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Principal Software Engineer, Compute Fleet Management jobs:

Get Ready For Your
Next Interview