Databricks

Staff Software Engineer - AI Research Infrastructure

Databricks$190K — $270K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • BS/MS or PhD in Computer Science or related field
  • 5+ years of software engineering experience, particularly in large-scale distributed systems
  • Deep experience in building and operating distributed systems and data pipelines
  • Proficiency in systems programming languages (C++, Rust, Go, Java, Scala)
  • Experience with cluster schedulers or job orchestration systems (Kubernetes, Slurm, Ray)
  • Understanding of modern ML training and inference workflows
  • Ability to communicate clearly with both researchers and engineers

Responsibilities

  • Design and implement infrastructure for large-scale experiments and model training
  • Enable quick transitions from ideas to large-scale experiments for researchers
  • Create tooling to improve developer productivity for research tasks
  • Influence the long-term roadmap for research computation at Databricks
  • Serve as a technical mentor for engineers working on AI systems

Benefits

  • Comprehensive benefits package to meet employee needs
  • Eligibility for annual performance bonuses
  • Access to equity options
  • Supportive work culture promoting collaboration and innovation
Full Job Description
Staff Software Engineer - AI Research Infrastructure

P-1215

Job Description

As a Staff Software Engineer, AI Research Infrastructure, you will be developing and running the research stack that powers Databricks AI Research. You will design and build services that schedule, orchestrate, and observe large-scale training and inference experiment workloads across thousands of GPUs, improve our dev tooling and ensure that researchers can iterate quickly without sacrificing reliability, efficiency, or security.

You'll partner closely with research scientists, ML engineers, and platform teams to turn experimental workloads into robust, repeatable pipelines, and to push the limits of what our infrastructure can support.

The Impact you will have

As a Staff Software Engineer on the AI Research Infra Team at Databricks, you will:
  • Design and implement infrastructure that supports large-scale experiments, data processing, and model training (e.g., HPC clusters, GPU fleets, or cloud-based systems)
  • Enable researchers to go from idea to large-scale experiment in minutes, not days, by building powerful abstractions for job submission, scheduling, and monitoring.
  • Create tooling that improves research developer productivity, such as experiment management systems, CI/testing infrastructure for research code, and workflows that reduce iteration time.
  • Influence the long-term roadmap for research computation, shaping how Databricks AI Research train, evaluate, and ship models to customers.
  • Serve as a technical mentor and force multiplier for other engineers working on compute, infra, and AI systems.

What We Look for
  • BS/MS or PhD in Computer Science or related field
  • 5+ years of software engineering experience, including substantial time working on large-scale distributed systems or infrastructure.
  • Have deep experience with building and operating distributed systems, data pipelines, or large-scale backend services, ideally involving GPUs, clusters, or major cloud providers.
  • Are proficient in one or more systems programming languages (e.g., C++, Rust, Go, Java, Scala) and can design, implement, and debug complex services.
  • Have built or significantly contributed to cluster schedulers, resource managers, or large-scale job orchestration systems (e.g., Kubernetes, Slurm, Ray, custom internal systems).
  • Understand modern ML training and inference workflows (e.g., distributed training, model parallelism, fine-tuning, evaluation), even if you're not primarily a research scientist.
  • Can move fast and be pragmatic in getting things done, while caring about operational excellence. Have driven complex systems from prototype to stable, well-owned services.
  • Communicate clearly with both researchers and engineers, and enjoy translating between research needs and infra realities.


Pay Range Transparency

Databricks is committed to fair and equitable compensation practices. The pay range(s) for this role is listed below and represents the expected salary range for non-commissionable roles or on-target earnings for commissionable roles. Actual compensation packages are based on several factors that are unique to each candidate, including but not limited to job-related skills, depth of experience, relevant certifications and training, and specific work location. Based on the factors above, Databricks anticipates utilizing the full width of the range. The total compensation package for this position may also include eligibility for annual performance bonus, equity, and the benefits listed above. For more information regarding which range your location is in visit our page here.

Local Pay Range

$190,000-$270,000 USD

BenefitsAt Databricks, we strive to provide comprehensive benefits and perks that meet the needs of all of our employees. For specific details on the benefits offered in your region click here.

About Databricks

Databricks is a unified analytics platform that provides data engineering, collaborative data science, and machine learning capabilities. The company was founded in 2013 by the original creators of Apache Spark, a popular open-source big data processing engine. Databricks provides a cloud-based platform that allows data teams to collaborate and build data pipelines, run machine learning models, and perform advanced analytics. The company has raised over $1 billion in funding and is valued at $38 billion as of November 2021.
Learn more about Databricks
Size
2,000 employees
Industry
Founded
2013

Similar Jobs

More Jobs at Databricks

More Information Technology Jobs

Find similar Staff Software Engineer - AI Research Infrastructure jobs: