Principal Site Reliability Engineer

iStreamPlanet   •  

Las Vegas, NV

Industry: Professional, Scientific & Technical Services

  •  

11 - 15 years

Posted 42 days ago

Description

While we are Headquartered in Seattle, we are seeking the best talent to join our team & will consider Remote/Telework options for candidates located anywhere in the US.

Who We Are:

iStreamPlanet delivers some of the world’s most prestigious and most viewed online experiences, such as the Olympics, Super Bowl, NCAA March Madness and FIFA World Cup. We are also helping traditional and new media brands, including Fox, NBC, Turner, DirecTV, Hulu and fuboTV, among others, reimagine their businesses by delivering their content directly to their consumers at scale via engaging online experiences. We are majority owned by Turner and a subsidiary of WarnerMedia, much like our sister companies HBO and Warner Bros, but we are also uniquely positioned to serve the broader market in the US and globally.

iStreamPlanet’s engineers are empowered to do what’s right for our customers. We encourage creative solutions, trust and demand data, and generally gravitate toward open source. We have a learning culture and continually seek ways to deliver a better product. We maintain a very high bar for ourselves and depend on one another to meet national broadcast expectations. We are committed to delivering the best experience to the fan whenever and wherever.

For more info, check us out at https://istreamplanet.com/

The Job

The Principal Site Reliability Engineer is responsible for leading cross-team engineering discussions to achieve scalable, measurable, fault-tolerant, and cost-effective cloud services. Connects Product and Operations teams with Engineering teams to identify and exceed business-critical service KPIs. A core team member of one or more projects where work is based on wide range of complex problems and deliverables. Responsible for developing and planning own or project team’s activities and establishing service level agreements across iStreamPlanet services teams. Independently determines and develops approach to solutions and timelines. A technical leader accountable for cross-team operational excellence program, end-to-end data awareness across solutions, active post-mortem culture with cross-team learning, and continuous improvement in iStreamPlanet’s service reliability.

The Day-to-Day

  • Partners with peer engineering teams throughout the full software development lifecycle.
  • Design, analyze, and troubleshoot fault-tolerant, distributed systems. Provide fellow engineering teams with systems design and scalability expertise.
  • Continually improve customer outcomes through quantitative service monitoring, alarming, and direct code improvements to our services.
  • Contribute expertise in Unix/Linus operating systems internals, administration, and TCP/IP networking.
  • Promote a cross-team culture of operational excellence and customer obsession within iStreamPlanet.
  • Provide technical oversight and mentoring to other team members.
  • Establish and implement cross-team disaster recovery contingencies with at least quarterly validation with external and internal stakeholders.
  • Establish and implement cross-team service availability key performance indicators in partnership with Product Management.
  • Establish and implement cross-team service error budgets, consisting of Service Level Indicators, Service Level Objectives, and Service Level Agreements in partnership with Engineering and Product Management leadership.
  • Partner with Engineering Directors and Product Management to develop quarterly service level agreements for iStreamPlanet’s engineering teams.
  • Establish and implement a monthly lecture series for iStreamPlanet team members that targets distributed computing fundamentals, service analytics, and fault-tolerant design patterns.

The Essentials

  • Bachelor’s degree in Computer Science or equivalent experience
  • 10+ years of commercial software development experience in distributed cloud services.
  • Strong understanding of one or more industry-standard languages (e.g. Go/C/C++/C#/Java/Swift/Python).
  • Experience working with Open Source solutions.
  • Strong experience with algorithms, data structures, complexity analysis, and software design.
  • Strong design skills, including design patterns and common software frameworks.
  • Passion for dev-ops, continuous improvement, nurturing a sustainable post-mortem culture, and commitment to driving down live-site overhead using a systematic problem-solving mindset, strong collaboration skills, and an eagerness to take ownership and drive.
  • Ability to obsess over customer needs and demonstrate customer empathy.
  • Proven ability to work and problem solve independently/collaboratively, to organize workload and priorities, high-quality execution, technical innovation/adoption, and initiative.