Site Reliability Engineer

IBM   •  

Virtual / Travel

8 - 10 years

Posted 210 days ago

This job is no longer available.

The Next Generation Cloud Network Engineering (NextGenCloud) team is a team dedicated to ensuring that the IBM Cloud is at the forefront of cloud technology, from data center design to network architecture to storage and compute clusters to flexible infrastructure services. While our focus is on Network as a Service (NaaS), we are part of the team building IBM's next generation cloud platform to deliver performance and predictability for our customers' most demanding workloads, at global scale and with leadership efficiency, resiliency and security. It is an exciting time, and as a team we are driven by this incredible opportunity to thrill our clients. We are looking for a Site Reliability Engineer to join our team, who innovates & shares our passion for winning in the cloud marketplace. 
 
This position is for a mid- to senior Reliability Engineer who should have at least 7 years' industry experience maintaining or assisting in maintaining site reliability. In this role, you will work as a member of the Site Reliability team with the following key responsibilities: 

  • Troubleshoot and debug software delivered by various development teams within NextGenCloud.
  • Provide detailed trouble reports back to the development teams including automated methods to reproduce any defects.
  • Assist troubleshooting and maintaining pre-production CICD systems in support of deployment.
  • Work with the team to ensure automation and the highest level of determinism possible in the installation and configuration of new systems (software and hardware).
  • Document automation and the interaction of software and system as necessary to enable in others.
  • Maintain services post-deployment through data collection and monitoring ensuring overall health of the services provided.
  • Participate in and support other teams with trouble issues when on call.
  • Participate in retrospectives.
  • Engage and encourage collaboration and a focus on issue resolution.
  • Engage in meaningful planning to improve software, systems, and processes.

To summarize, in this role you will engage in all aspect of the lifecycle of the IBM’s NaaS, from idea to architecture and through deployment, operation, and improvement ensuring that our clients have the most reliable and performant experience possible.

This opportunity is for someone in the continental United States.
Job Requirements

  • 7+ years’ experience as with systems and/or software engineering.
  • 2+ years’ experience with software development.
  • 2+ years’ experience with systems engineering.
  • 2+ years’ experience troubleshooting software.
  • Experience in a devops environment.
  • Experience with Git.
  • Experience with OpenStack or similar proprietary cloud like Azure or AWS.
  • Familiarity with CICD and their pipelines; experience with Zuul or Jenkins a plus.
  • Familiarity with containers and HA clusters; experience with Docker and Kubernetes a plus.
  • Excellent knowledge of TCP/IP networking.
  • Strong background in network engineering a plus.
  • Hands-on data center operational experience a big plus.
  • Proven ability to collaborate and work well within a team.
  • Ability to communicate effectively both verbally and in writing.

Required Technical and Professional Expertise

Job Requirements

  • 7+ years’ experience as with systems and/or software engineering.
  • 2+ years’ experience with software development.
  • 2+ years’ experience with systems engineering.
  • 2+ years’ experience troubleshooting software.
  • Experience in a devops environment.
  • Experience with Git.
  • Experience with OpenStack or similar proprietary cloud like Azure or AWS.
  • Familiarity with CICD and their pipelines; experience with Zuul or Jenkins a plus.
  • Familiarity with containers and HA clusters; experience with Docker and Kubernetes a plus.
  • Excellent knowledge of TCP/IP networking.
  • Strong background in network engineering a plus.
  • Hands-on data center operational experience a big plus.
  • Proven ability to collaborate and work well within a team.
  • Ability to communicate effectively both verbally and in writing.

Preferred Technical and Professional Experience

Preferred

* 10+ years experience in all of the above
* Devops experience working with Ansible, Puppet, or Chef
* Experience with Data Center layout planning
 

Eligibility Requirements

  • None

143340BR