Reliability Engineer

Facebook   •  

Menlo Park, CA

Industry: Technology


Not Specified years

Posted 4 days ago

Facebook is seeking a Reliability Engineer to work with our hardware design, software engineering, and manufacturing partner teams to deliver robust and hardened infrastructure solutions. This is a full-time role based at our office in Menlo Park, CA.


  • Identify and define metrics and key performance indicators (KPIs) for Facebook's fielded infrastructure hardware
  • Use metrics to develop reliability models and predictive failure analytics
  • Maintain and enhance fielded hardware failure sampling program that drive failures to root cause
  • Partner with multiple engineering teams to develop corrective actions against key hardware failures to maintain field performance within goal
  • Partner with Facebook teams to develop reliability requirements for compute, storage, network, and other data center hardware
  • Participate in new product design reviews and ensure learnings from hardware failures are integrated into new designs
  • Drive data collection and data quality enhancements that support reliability analysis at scale and that supports new capabilities
  • May require up to 10% international travel


  • Experience with Hive, MySQL, or other data warehouse technologies with experience of delivering metrics end-to-end
  • Experience in reliability methods, modeling, and testing practice
  • Experience in using statistical software tools such as JMP or MiniTab
  • Knowledge in mechanical or electrical design, material sciences, and mechanisms of consumer electronics


  • BS in Electrical, Mechanical, Computer Engineering, Material Sciences, or equivalent
  • MS or PhD in these fields
  • Knowledge of servers, storage, or networking technologies and equipment