ML Infrastructure Engineering Manager

5 - 7 years experience  •  Business Services

Salary depends on experience
Posted on 09/21/17
Santa Clara, CA
5 - 7 years experience
Business Services
Salary depends on experience
Posted on 09/21/17

ML Infrastructure Engineering Manager

  • Job Number: 112979394
  • Santa Clara Valley, California, United States
  • Posted: 30-Aug-2017
  • Weekly Hours: 40.00

Job Summary

We’ve laid the groundwork for machine learning within Siri Operations and now we are looking for someone to take it to the next level to improve the reliability of our systems and services. Your first initiative is to build a team of engineers to build the foundation of our machine learning platform for operations. You will be required to work with multiple groups across the organization focusing on reliability, performance, security, and other big data.

Key Qualifications

  • Minimum of 5 years experience building large-scale consumer-facing software.
  • At least 2 to 3 years in an engineering managerial role.
  • Minimum of 2 to 3 years developing systems for machine learning and anomaly detection.
  • At least 2 years working with/or in an operations environment.
  • Expertise in ML, data modeling, anomaly detection, and statistical analysis.
  • Experience in programming languages used heavily in machine learning such as Python, Java, Scala, or R.
  • Hands on experience developing on high performance compute clusters and framework such as HDFS, Spark, and Kafka.
  • Experience with time series and logging systems such as Graphite and Splunk is a plus.


You will manage a team of machine learning and operations engineers. You will be responsible in driving initiatives to collect and extract information such as usage patterns, performance regressions, and systemic problems from metrics and logs of our systems and services. You must be comfortable in coordinating cross-functional efforts with different teams in the organization. You must be hands-on while working closely with others in a fast-paced environment with rapidly changing priorities. - Design and engineer anomaly detection and predictive systems by analyzing metrics and logs from systems and services. - Quickly break down projects into tasks, milestones, establish priorities, track progress, identify and resolve blockers. - Lead and recruit engineers with expertise in IT operations and machine learning. - Able to breakdown high level vision into concrete requirements with little guidance. - Maintain a strong culture of development practices including automated testing, robust code design, and documentation to ensure high quality results.


Bachelors or Master's degree in Computer Science, or equivalent experience.

Not the right job?
Join Ladders to find it.
With a free Ladders account, you can find the best jobs for you and be found by over 20,0000 recruiters.