Member of Technical Staff, Post-Training, RL Environments

Mirendil

$350K — $500K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • 5+ years of experience in research engineering or a related field
  • Strong background in machine learning, particularly reinforcement learning
  • Proficiency in designing and building data pipelines
  • Experience with system optimization and infrastructure development
  • Familiarity with collaborative development environments and tools
  • Ability to work in a team-oriented and cross-functional setting

Responsibilities

  • Build and automate data collection pipelines for complex RL tasks
  • Develop systems to identify and prevent reward hacking
  • Create scalable and sandboxed execution environments for multi-agent tasks
  • Design evaluation systems for training environments' effects on model behavior
  • Collaborate across teams to enhance production model performance
  • Drive initiatives to continuously improve data and environment quality

Benefits

  • Meaningful equity grant based on experience
  • Comprehensive health, dental, and vision insurance
  • Generous vacation policies
  • Opportunities for professional development and training
  • Flexible work arrangements
Full Job Description
The Role

We are looking for a research engineer to build the data systems and execution environments that power reinforcement learning at Mirendil. The quality of our models depends directly on the quality of the data and environments we train on; you will own those systems end-to-end. Some example areas you might work on (not limited to):

  • Build and automate data collection pipelines for complex, long-horizon RL tasks.
  • Build robust systems to identify and prevent reward hacking.
  • Build scalable sandboxed execution environments for realistic tasks involving potentially multiple agents, nodes, and users.
  • Design systems to estimate the influence of training environments on production model behavior.
  • Collaborate with teams across the stack to identify potential axes of improvements in production model behavior, and develop training environments to push these axes.


If you're excited about building the data and environment infrastructure that determine what our models learn, we'd love to hear from you.

We offer a base salary of $350,000-$500,000 USD and a meaningful equity grant, depending on experience and background, along with competitive benefits.

Similar Jobs

More Jobs at Mirendil

More Information Technology Jobs

Find similar Member of Technical Staff, Post-Training, RL Environments jobs: