Roku

Senior Machine Learning Engineer

Roku$130K — $160K *
Information Technology
8 - 10 years of experience
Job Overview by Ladders

Qualifications

  • BS or MS in Computer Science, Engineering, or a related field
  • 8+ years in DevOps or ML infrastructure, with 4+ years in large-scale ML/AI systems
  • Strong programming skills in Python, Scala, or Java
  • Deep experience with Kubernetes on GCP and/or AWS
  • Expertise in NoSQL databases and low-latency technologies
  • Hands-on experience with data orchestration tools like Apache Airflow and Apache Spark
  • Infrastructure-as-code experience with Terraform.

Responsibilities

  • Lead design of scalable cloud infrastructure for ML workloads on AWS and GCP
  • Improve CI/CD systems for reliable production model releases
  • Evolve infrastructure for low-latency model inference
  • Define observability standards for ML system monitoring
  • Conduct incident response and root-cause analysis for ML infrastructure
  • Collaborate with data scientists to enhance platform usability
  • Champion operational excellence through automation and continuous improvement.

Benefits

  • Global access to mental health and financial wellness support
  • Comprehensive healthcare options (medical, dental, vision)
  • Local benefits including disability and life insurance
  • 401(k)/pension retirement options
  • Support for employees' time off for personal needs and local leave policies
Full Job Description
About the team

The Advertising Performance group focuses on performance for all participants in the Advertising ecosystem - Advertisers, Publishers, and Roku. The systems and solutions span different disciplines and technologies to perform real-time multi-objective optimization with distributed systems at large scale and low latencies. We use Machine Learning, Reinforcement Learning, AI, Control and Optimization Systems, and Auction Dynamics to solve a large set of complex problems. At the core of this is our Machine Learning, Experimentation and Inference Platform that powers the entire landscape, which we continuously evolve over time.

About the role

We are seeking a talented and experienced Senior Software Engineer, MLOps/DevOps to join the Advertising Performance team and play a critical role in supporting and scaling our Machine Learning infrastructure. The ideal candidate has a strong background in DevOps/SRE practices, cloud infrastructure management, and MLOps tooling - with a passion for building platforms that accelerate ML experimentation and deployment at internet scale.

You will partner closely with ML Scientists and Engineers to streamline the end-to-end ML lifecycle across training, evaluation, deployment, and monitoring - on top of a modern, cloud-native stack running on GCP and AWS using Kubernetes, Apache Airflow, Spark, Ray, MLflow, Chronon, etc.

What you'll be doing
  • Lead the design and operation of scalable, production-grade cloud infrastructure for ML workloads across AWS and GCP, including GPU/TPU-based training and inference environments
  • Architect and improve CI/CD systems for ML models and platform services to enable fast, reliable, and safe production releases
  • Own and evolve low-latency infrastructure for real-time model inference, including KV store and vector databases
  • Define and enforce observability standards for ML systems, including model performance monitoring, drift detection, capacity planning, and pipeline health metrics
  • Participate in on-call rotation, leading incident response and root-cause analysis for critical ML training and serving infrastructure
  • Partner with data scientists and ML engineers to improve platform usability, accelerate model iteration, and implement strong MLOps and SRE best practices
  • Champion operational excellence across ML infrastructure through automation, resilience engineering, disaster recovery planning, and continuous improvement


We're excited if you have
  • BS or MS in Computer Science, Engineering, or a related quantitative field
  • 8+ years of experience in DevOps, SRE, or ML infrastructure, including 4+ years supporting large-scale ML or AI systems
  • Strong programming skills in Python and/or Scala or Java for platform automation and tooling
  • Deep experience with Kubernetes and container orchestration on GCP (GKE) and/or AWS (EKS)
  • Expertise with NoSQL or low-latency data stores such as Aerospike or similar technologies
  • Hands-on experience with data and orchestration technologies such as Apache Spark, Apache Flink, Apache Airflow, and Kafka
  • Experience building and maintaining CI/CD systems using tools such as Jenkins or GitLab Runner
  • Familiarity with feature engineering platforms such as Chronon and model lifecycle tools such as MLflow
  • Strong infrastructure-as-code experience with Terraform or similar tooling
  • Experience with observability platforms such as Prometheus, Grafana, and Datadog
  • Excellent communication and cross-functional collaboration skills
  • Experience in the Advertising domain is a plus


#LI-DH2

Our Hybrid Work Approach

Roku fosters an inclusive and collaborative environment where teams work in the office Monday through Thursday. Fridays are flexible for remote work except for employees whose roles are required to be in the office five days a week or employees who are in offices with a five day in office policy.

Benefits

Roku is committed to offering a diverse range of benefits as part of our compensation package to support our employees and their families. Our comprehensive benefits include global access to mental health and financial wellness support and resources. Local benefits include statutory and voluntary benefits which may include healthcare (medical, dental, and vision), life, accident, disability, commuter, and retirement options (401(k)/pension). Employees are supported in taking time off, in accordance with local leave policies and other personal needs to support their evolving work and life needs. It's important to note that not every benefit is available in all locations or for every role. For details specific to your location, please consult with your recruiter.

Accommodations

Roku welcomes applicants of all backgrounds and provides reasonable accommodations and adjustments in accordance with applicable law. If you require reasonable accommodation at any point in the hiring process, please direct your inquiries to [email protected].

About Roku

Roku is an American consumer electronics company founded in 2002. The company is best known for its streaming devices that allow users to access internet-based video content on their televisions. Roku's devices are available in several models and are sold in the United States and other countries. The company also offers a streaming service called The Roku Channel that features a selection of movies and TV shows. Roku went public in 2017 and is traded on the NASDAQ stock exchange.
Learn more about Roku
Size
3,000 employees
Market Cap
$5.5 billion
Industry
Net Income
-$17.5 million
Founded
2002
5 Year Trend
+47.3%
Revenue
$1.7 billion
NASDAQ

Similar Jobs

More Jobs at Roku

More Information Technology Jobs

Find similar Senior Machine Learning Engineer jobs: