About the Team:As a Staff MLOps Engineer, you will build and own the infrastructure, tooling, and scalable systems that make high-impact AI possible. You'll architect and maintain the platforms that power data ingestion, feature computation, model training, automated evaluation, deployment, and ongoing monitoring for the ML teams building recommendations, LLM-based experiences, ads, visual search, growth, and trust & safety. You will design foundational systems that allow our ML engineers to experiment faster, ship models more reliably, and operate them with confidence in production.
This is a hybrid role based in our Bay Area (SF or Palo Alto) or our Chicago offices and will require you to be in office Tuesdays and Thursdays.
About the Job:- Build and maintain end-to-end ML pipelines for data ingestion, feature computation, model training, validation, deployment, and inference, all at substantial scale of data
- Stand up and manage a feature store, ensuring feature consistency, lineage, and reuse across teams.
- Expertise with best in class tools for managing deployment, scheduling, and environments and how to use them in the specialized regime of ML Infrastructure.
- Develop automated model deployment workflows with CI/CD, safe rollout strategies, and reproducibility guarantees.
- Implement monitoring and observability for ML systems, including data quality checks, drift detection, performance metrics, and alerting.
- Build and support training environments with experiment tracking, distributed training, hyperparameter tuning, and artifact and environment management.
- Collaborate with ML engineers and data engineers to streamline workflows, improve model iteration speed, and enforce MLOps best practices.
- Ensure reliability, scalability, and maintainability of ML systems through strong engineering and operational rigor.
Role Requirements:- Bachelor's degree in CS, Engineering, Mathematics, or related field.
- 5+ years experience in MLOps, ML platform engineering, ML infrastructure, or similar roles.
- Strong experience building production ML pipelines and supporting end-to-end ML workflows.
- Excellent engineering fundamentals: Python, SQL, bash, Git.
- Experience with big data and distributed compute: Snowflake, Spark/pySpark, Airflow, Kubernetes, Docker, Helm.
- Experience with ML frameworks (PyTorch, TensorFlow) sufficient to support training pipelines and deployment workflows.
- Strong understanding of cloud platforms (AWS, GCP, or Azure).
- Ability to produce well-engineered, maintainable software with tests, documentation, and operational rigor.
- Experience with data quality frameworks, observability tooling, or experiment tracking systems.
You May Thrive in this Role if You:- Experience implementing full model lifecycle management (from data 12 training 12 deployment 12 monitoring).
- Experience with vector databases, embeddings pipelines, or retrieval systems.
- Familiarity with NLP/LLM-based data pipelines or image/vision data workflows.
- Experience with recommendation system infrastructure.
- Strong grasp of classical ML concepts as they relate to platform design.
- Knowledge of data governance, compliance, retention, and classification.
- Track record of partnering with research/ML teams to operationalize models at scale.
Benefits and Perks:- Health, Dental & Vision Full premium coverage for you. Partial coverage for dependents.
- Family Formation Up to $200,000 in fertility and family-building support, covering IVF, surrogacy, egg freezing, and adoption.
- Retirement: 401(k) with 6% match and immediate vesting.
- Compensation: Industry-competitive compensation, company bonus, and equity for every employee.
- Gender-Affirming Care : Industry-leading gender-affirming offerings with up to 90% cost coverage, access to Included Health, monthly stipends for HRT, and more.
- Time Off & Rest Flexible vacation policy. Two company-wide rest weeks per year.
- Other Benefits: Monthly stipends for cell phone, internet, wellness, food, and commuting, breakfast/lunch