Staff Engineer, Lustre

Data Direct Networks

• $185K — $230K *

US-AnywhereRemote in California, US

Information Technology

8 - 10 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

10+ years of experience in systems software, distributed systems, storage, Linux kernel or filesystem engineering.
Strong background in LustreFS development or performance engineering with expertise in at least one subsystem.
Proficient in C programming and Linux systems debugging.
In-depth knowledge of Linux kernel internals and filesystem performance analysis.
Experience with LNet or high-performance transport methods like RDMA or InfiniBand.
Ability to troubleshoot issues across multiple technology layers, from client to backend storage.
Excellent collaboration skills to work effectively in a fast-paced engineering environment.

Responsibilities

Design, develop, and debug features and enhancements for LustreFS across various subsystems.
Investigate customer issues, conduct root-cause analysis, and implement reliable fixes.
Contribute to performance tuning and reliability improvements for large-scale deployments.
Participate in code and design reviews to ensure high testing standards and operational readiness.
Collaborate with QE and support to reproduce issues and enhance data diagnostics.
Document subsystem behaviors, debugging methods, and best operational practices.
Leverage AI-assisted tools to streamline debugging and improve code comprehension.

Benefits

Dynamic and driven team environment encouraging engineering excellence.
Opportunities for cross-functional work due to a flat organizational structure.
Potential for leadership roles through initiative and outstanding delivery.
Emphasis on strong communication skills for team success.

Full Job Description

Job Description

We are seeking a Staff Engineer – LustreFS with 10+ years of experience in distributed storage and Linux-based systems engineering. This is a hands-on senior technical role focused on design, debugging, performance, and operational excellence across LustreFS and adjacent stack components. The ideal candidate brings strong expertise in one or more Lustre subsystems, can independently drive complex investigations, and collaborates effectively across engineering, QE, support and release teams. Engineers who are comfortable using AI to accelerate triage, debugging, code comprehension and new feature design will be especially valuable.

Key Responsibilities

Design, develop and debug LustreFS features, fixes and enhancements across relevant subsystems such as llite, MDS/MDT, OSS/OST, LDLM and LNet.
Investigate customer and scale-related defects, drive root-cause analysis and implement high-quality fixes with strong attention to correctness and maintainability.
Contribute to performance tuning, failure analysis and reliability improvements for large-scale Lustre deployments.
Participate actively in code reviews, design reviews and subsystem discussions, bringing rigor to testing and operational readiness.
Work closely with QE and support to reproduce issues, improve diagnostic data quality and increase coverage for high-risk failure scenarios.
Help document subsystem behavior, debugging approaches, known failure patterns and operational best practices.
Use AI-assisted tools where appropriate to speed up issue triage, summarize logs, improve code understanding and capture reusable lessons learned.

Required Qualifications

10+ years of experience in systems software, distributed systems, storage, Linux kernel or filesystem engineering.
Strong experience in LustreFS development, support or performance engineering with depth in at least one major subsystem.
Strong C programming and Linux systems debugging skills.
Working knowledge of Linux kernel internals, filesystem semantics, networking and performance analysis.
Experience with LNet and/or high-performance transports such as RDMA, InfiniBand, RoCE or TCP-based storage networking.
Ability to debug and resolve issues spanning multiple layers including client, server, network and backend storage.
Strong collaboration skills and the ability to work across functions in a fast-moving engineering environment.

Preferred Skills

Experience in HPC, AI infrastructure or large-scale parallel storage environments.
Exposure to metadata-heavy and throughput-heavy workload characterization and tuning.
Familiarity with ZFS, ldiskfs, NVMe-backed storage and related observability / performance tooling.
Experience creating test plans, reproducer frameworks, runbooks or diagnostic automation.
Comfort using AI tools to accelerate debugging, code reviews, triage, documentation and early-stage design ideation.
Experience mentoring junior engineers or leading focused technical efforts within a subsystem.

What You Will Work On

Hands-on development and debugging of LustreFS defects, performance issues and subsystem enhancements.
Customer-facing and scale-related issue investigation across llite, metadata, object storage, LNet and transport layers.
Collaborative design and implementation of reliability, observability and serviceability improvements.
Reviewing and validating fixes through targeted tests, failure injection, log analysis and performance characterization.
Using AI-assisted workflows to accelerate triage, debug loops, code understanding and documentation quality.
Contributing to team redundancy by strengthening documentation, code review quality and subsystem knowledge sharing.

Why This Role Matters

This role is central to building durable engineering redundancy in LustreFS: expanding deep subsystem ownership, reducing concentration risk, and accelerating next-generation delivery through strong engineering fundamentals and AI-enabled execution.

Salary Range for this role: $185,000 - $230,000

Join our dynamic and driven team, where engineering excellence is at the heart of everything we do. We seek individuals who love to challenge themselves and are fueled by curiosity. Here, you'll have the opportunity to work across various areas of the company, thanks to our flat organizational structure that encourages hands-on involvement and direct contributions to our mission. Leadership is earned by those who take initiative and consistently deliver outstanding results, both in their work ethic and deliverables, making strong prioritization skills essential. Additionally, we value strong communication skills in all our engineers and researchers, as they are crucial for the success of our teams and the company as a whole.

Interview Process: After submitting your application, one of our recruiters will review your resume. If your application passes this stage, you will be invited to a 30-minute interview during which a member of our team will ask some basic questions. If you clear the interview, you will enter the main process, which can consist of up to four interviews in total:

Coding assessment: Often in a language of your choice.
Systems design: Translate high-level requirements into a scalable, fault-tolerant service (depending on role).
Real-time problem-solving: Demonstrate practical skills in a live problem-solving session.
Meet and greet with the wider team.
Our goal is to finish the main process in 2-3 weeks at most.

#LI-Remote

* Ladders Estimates

Similar Jobs

Staff Software Engineer - ServiceNow ITAM
$154K — $199K *
The Walt Disney Company
Orlando, FL 32828 (Orange County)
Reposted Today
Staff Software Engineer - ServiceNow ITAM
$154K — $199K *
The Walt Disney Company
Burbank, CA 91505 (Los Angeles County)
Reposted Today
Staff Software Engineer - ServiceNow ITAM
$154K — $199K *
The Walt Disney Company
New York, NY 10025 (New York County)
Reposted Today
Staff Software Engineer - ServiceNow ITAM
$154K — $199K *
The Walt Disney Company
Seattle, WA 98115 (King County)
Reposted Today
Staff Software Development Engineer
$156K — $236K *
CVS Health
Richardson, TX 75080 (Dallas County)
Today
Staff Software Development Engineer
$157K — $236K *
CVS Health
Scottsdale, AZ 85254 (Maricopa County)
Today

Get Ready For Your
Next Interview

More Jobs at Data Direct Networks

Sr. Staff Engineer, Lustre
$215K — $265K *
Remote
Today
Information Technology
Remote in California, US
Data Scientist/ Data Architect
$215K — $265K *
Remote
Today
Information Technology
Remote in California, US
Staff Engineer in Test
$145K — $185K *
Remote
Today
Information Technology
Remote in California, US
Staff Engineer, Lustre
$185K — $230K *
Remote
Today
Information Technology
Remote in California, US
Account Executive - FSI
$150K — $170K *
Remote
Reposted Yesterday
Finance & Insurance
Remote

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
5 days ago
Senior Data Architect
$130K — $160K *
Hilti, Inc
Plano, TX 75025 (Collin County)
Reposted Today
Software Engineer II (Full Stack, Platform)
$125K — $175K *
WHOOP
Boston, MA 02115 (Suffolk County)
Today
SOFTWARE ENGINEER - Gitlab/DevOps - 7+ yrs of Experience - TS/SCI w/Poly clearance is required - ID
$234K — $241K *
Halogen Engineering
Annapolis, MD 21401 (Anne Arundel County)
Today
GAI Expert Software Engineer - 69241141
$112K — $132K *
Cognizant
Charlotte, NC 28269 (Mecklenburg County)
Today

Find similar Staff Engineer, Lustre jobs:

Nationwide Remote

Staff Engineer, Lustre

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Staff Engineer, Lustre jobs:

Get Ready For Your
Next Interview