MLOps Engineer

Stanford Health Care

$165K — $218K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's or higher degree in Computer Science, Engineering or a related field.
  • Three or more years of directly related experience as an MLOps Engineer.
  • Strong knowledge of cloud platforms (AWS, Azure, Google Cloud) and infrastructure-as-code tools (Terraform, CloudFormation).
  • Proficiency in Docker and Kubernetes for container orchestration.
  • Solid programming skills in Python, Rust or Go, with experience in scripting and automation.
  • Familiarity with machine learning frameworks (PyTorch, TensorFlow, scikit-learn).
  • Deep understanding of DevOps principles, agile methodologies, and software development lifecycle.

Responsibilities

  • Design, build and maintain scalable infrastructure for AI/ML systems.
  • Develop and implement CI/CD pipelines for AI/ML models and applications.
  • Collaborate with cross-disciplinary teams to optimize model training and deployment.
  • Monitor and troubleshoot AI/ML systems for performance and reliability.
  • Maintain training and inference pipelines across multi-cloud environments.
  • Manage Kubernetes pods and model registries.
  • Implement security best practices in AI/ML workflows.

Benefits

  • Mentorship and technical guidance opportunities for junior team members.
  • Access to cutting-edge AI and ML technologies in healthcare.
  • Collaboration with leaders in clinical specialties and AI research.
  • Opportunities to shape the infrastructure for innovative healthcare solutions.
Full Job Description
Day - 08 Hour (United States of America)

We are seeking a high-caliber Senior AI Platform & ML Ops Engineer to architect the "layered" infrastructure required for autonomous, agentic systems within Stanford Healthcare. In this role, you will be the "Master Chef" of our AI ecosystem, seamlessly folding Expert-Level DevOps (Kubernetes, Terraform, DevOps orchestration) with Agentic Application Development (LangGraph, CrewAI, Tool-calling logic). You won't just manage servers; you will build the robust, full-stack "factory" where multi-agent frameworks interact with healthcare APIs, ensuring every autonomous action is governed by strict ML Ops observability (LangSmith, Arize) and safety guardrails. If you have the "crispy" coding skills to build RAG pipelines in Python and the "rich" architectural depth to deploy scalable microservices, extensive full stack software development expertise, we want you to lead the integration of reasoning-based AI into the future of clinical and business workflow automations.

This is a Stanford Health Care job.

A Brief Overview
The MLOPs Engineer will play an integral role incorporating Artificial Intelligence (AI) within Stanford Health Care. The solutions will impact patient care, medical research, and operational services. This group is tasked to innovate, build, deploy and monitor production grade AI, machine learning (ML) and predictive algorithms into healthcare. The role will partner closely with lead researchers within the AI field and leaders across various clinical specialties and operations.

This role will report to the Infrastructure group and have a dotted line relationship to the Data Science team. The role will be responsible for maintaining cloud-based infrastructure as code repositories, maintaining infrastructure, deployment pipelines and designing the security landscape for the team and objects. The role will set the standards for the full SDLC of projects for the Data Science team.

Locations
Stanford Health Care

What you will do
  • Design, build and maintain scalable and robust infrastructure for AI/ML systems, including cloud-based environments, containerization and orchestration platforms.
  • Develop and implement CI/CD pipelines to automate the deployment, testing and monitoring of AI/ML models and applications.
  • Collaborate with data scientists, data engineers and software engineers to optimize model training, deployment and inference pipelines.
  • Monitor and troubleshoot AI/ML systems to ensure high availability, performance and reliability.
  • Maintain and monitor model training and inference pipelines across multi-cloud tenants especially around Large Language Models (LLMs).
  • Maintain Kubernetes pods, container registry and virtual machine image library and model registry
  • Monitor infrastructure utilization and costs pertaining to model training, inference and GPU utilization
  • Implement best practices for security, data privacy and compliance in AI/ML workflows and infrastructure.
  • Evaluate and integrate new tools, technologies and frameworks to improve the efficiency and effectiveness of our MLOps processes.
  • Mentor and provide technical guidance to junior members of the organization.
  • Stay up-to-date with the latest advancements and trends in MLOps, DevOps and cloud technologies and share them with the team.


Education Qualifications
  • Bachelor's or higher degree in Computer Science, Engineering or a related field


Experience Qualifications
  • Three (3) or more years of directly related experience


Required Knowledge, Skills and Abilities
  • Proven experience as an MLOps Engineer.
  • Strong knowledge of cloud platforms such as AWS, Azure or Google Cloud and experience with infrastructure-as-code tools like Terraform or CloudFormation.
  • Proficiency in containerization technologies such as Docker and container orchestration platforms like Kubernetes.
  • Experience with CI/CD tools such as GitLab CI/CD, Github Actions or CiricleCI.
  • Solid programming skills in languages such as Python, Rust or Go and experience in scripting and automation.
  • Familiarity with machine learning frameworks and libraries such as PyTorch, Tensorflow and scikit-learn.
  • Deep understanding of DevOps principles, agile methodologies and software development lifecycle.
  • Strong problem-solving and trouble shooting skills, with the ability to analyze and resolve complex technical issues.
  • Excellent communication and collaboration skills with the ability to work effectively in cross-functional teams.


Physical Demands and Work Conditions
Blood Borne Pathogens
  • Category III - Tasks that involve NO exposure to blood, body fluids or tissues, and Category I tasks that are not a condition of employment


These principles apply to ALL employees:

SHC Commitment to Providing an Exceptional Patient & Family Experience

Stanford Health Care sets a high standard for delivering value and an exceptional experience for our patients and families. Candidates for employment and existing employees must adopt and execute C-I-CARE standards for all of patients, families and towards each other. C-I-CARE is the foundation of Stanford's patient-experience and represents a framework for patient-centered interactions. Simply put, we do what it takes to enable and empower patients and families to focus on health, healing and recovery.

You will do this by executing against our three experience pillars, from the patient and family's perspective:

  • Know Me: Anticipate my needs and status to deliver effective care
  • Show Me the Way: Guide and prompt my actions to arrive at better outcomes and better health
  • Coordinate for Me: Own the complexity of my care through coordination


Base Pay Scale: Generally starting at $79.21 - $104.97 per hour

The salary of the finalist selected for this role will be set based on a variety of factors, including but not limited to, internal equity, experience, education, specialty and training. This pay scale is not a promise of a particular wage.

Similar Jobs

More Jobs at Stanford Health Care

More Information Technology Jobs

Find similar MLOps Engineer jobs: