Databricks

Staff Software Engineer - AI Research Infrastructure

Databricks$190K — $270K *
Enterprise Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • BS/MS or PhD in Computer Science or related field
  • 5+ years of software engineering experience with large-scale distributed systems
  • Deep experience building and operating distributed systems or backend services, preferably with GPUs
  • Proficient in systems programming languages like C++, Rust, Go, Java, or Scala
  • Experience with cluster schedulers or orchestration systems (e.g., Kubernetes, Slurm)
  • Understanding of modern ML training and inference workflows
  • Ability to communicate effectively with researchers and engineers

Responsibilities

  • Design and implement infrastructure for large-scale experiments and model training
  • Enable rapid experimentation by building abstractions for job submission and scheduling
  • Create tooling to enhance productivity in research development
  • Influence the research computation roadmap at Databricks
  • Serve as a mentor for engineers in compute and AI systems

Benefits

  • Comprehensive benefits and perks
  • Eligibility for annual performance bonus
  • Equity opportunities
  • Supportive work environment promoting career development
Full Job Description
Staff Software Engineer - AI Research Infrastructure

P-1215

Job Description

As a Staff Software Engineer, AI Research Infrastructure, you will be developing and running the research stack that powers Databricks AI Research. You will design and build services that schedule, orchestrate, and observe large-scale training and inference experiment workloads across thousands of GPUs, improve our dev tooling and ensure that researchers can iterate quickly without sacrificing reliability, efficiency, or security.

You'll partner closely with research scientists, ML engineers, and platform teams to turn experimental workloads into robust, repeatable pipelines, and to push the limits of what our infrastructure can support.

The Impact you will have

As a Staff Software Engineer on the AI Research Infra Team at Databricks, you will:
  • Design and implement infrastructure that supports large-scale experiments, data processing, and model training (e.g., HPC clusters, GPU fleets, or cloud-based systems)
  • Enable researchers to go from idea to large-scale experiment in minutes, not days, by building powerful abstractions for job submission, scheduling, and monitoring.
  • Create tooling that improves research developer productivity, such as experiment management systems, CI/testing infrastructure for research code, and workflows that reduce iteration time.
  • Influence the long-term roadmap for research computation, shaping how Databricks AI Research train, evaluate, and ship models to customers.
  • Serve as a technical mentor and force multiplier for other engineers working on compute, infra, and AI systems.

What We Look for
  • BS/MS or PhD in Computer Science or related field
  • 5+ years of software engineering experience, including substantial time working on large-scale distributed systems or infrastructure.
  • Have deep experience with building and operating distributed systems, data pipelines, or large-scale backend services, ideally involving GPUs, clusters, or major cloud providers.
  • Are proficient in one or more systems programming languages (e.g., C++, Rust, Go, Java, Scala) and can design, implement, and debug complex services.
  • Have built or significantly contributed to cluster schedulers, resource managers, or large-scale job orchestration systems (e.g., Kubernetes, Slurm, Ray, custom internal systems).
  • Understand modern ML training and inference workflows (e.g., distributed training, model parallelism, fine-tuning, evaluation), even if you're not primarily a research scientist.
  • Can move fast and be pragmatic in getting things done, while caring about operational excellence. Have driven complex systems from prototype to stable, well-owned services.
  • Communicate clearly with both researchers and engineers, and enjoy translating between research needs and infra realities.


Pay Range Transparency

Databricks is committed to fair and equitable compensation practices. The pay range(s) for this role is listed below and represents the expected salary range for non-commissionable roles or on-target earnings for commissionable roles. Actual compensation packages are based on several factors that are unique to each candidate, including but not limited to job-related skills, depth of experience, relevant certifications and training, and specific work location. Based on the factors above, Databricks anticipates utilizing the full width of the range. The total compensation package for this position may also include eligibility for annual performance bonus, equity, and the benefits listed above. For more information regarding which range your location is in visit our page here.

Local Pay Range

$190,000-$270,000 USD

BenefitsAt Databricks, we strive to provide comprehensive benefits and perks that meet the needs of all of our employees. For specific details on the benefits offered in your region click here.

About Databricks

Databricks is a unified analytics platform that provides data engineering, collaborative data science, and machine learning capabilities. The company was founded in 2013 by the original creators of Apache Spark, a popular open-source big data processing engine. Databricks provides a cloud-based platform that allows data teams to collaborate and build data pipelines, run machine learning models, and perform advanced analytics. The company has raised over $1 billion in funding and is valued at $38 billion as of November 2021.
Learn more about Databricks
Size
2,000 employees
Industry
Founded
2013

Similar Jobs

More Jobs at Databricks

More Enterprise Technology Jobs

Find similar Staff Software Engineer - AI Research Infrastructure jobs: