Site Reliability Engineer

IBM   •  

Austin, TX

5 - 7 years

Posted 216 days ago

This job is no longer available.

Job Description

IBM Cloud Brokerage Services is IBM’s solution for Hybrid Cloud Enablement, giving our
client’s IT organization visibility and governance, without sacrificing speed and business agility.
Our solution is built on our recent acquisition of Gravitant. We continue to operate with a startup mentality but with access to the tremendous market reach of IBM. We are global in scale, with customers in Europe, North America, South American and Asia Pacific. We are panindustry in scope, delivering to a client base representing a range of industries including:
telecommunications, retail, aerospace, financial services and others.

IBM Cloud Brokerage is a purpose-built suite of applications that enables a self-service ability to
browse, search, order and fulfill services powered by a comprehensive, curated IT as a Service
catalog spanning Public, Private and Hybrid Clouds and Traditional IT providers. It is a core
component of IBM’s strategic investment in the IBM Services Platform with Watson (ISPW), a
complete and automated IT as a Service environment powered by the unmatched cognitive
capability of Watson.

The Cloud Brokerage Site Reliability Engineer will be part of a group deploying and managing
complex Enterprise software solutions in the areas of cloud brokerage, cloud management, data center transformation, Enterprise Hybrid Cloud Architectures and IT Governance.
Our delivery organization is made up of functional teams managing (a) Client Advocacy, (b)
Client Onboarding and Transformation, (c) Client Solution Engineering and (d) Client Services
and Enablement.

The Brokerage Site Reliability Engineer position is responsible for:
• Designing, analyzing, and troubleshooting large-scale distributed systems
• Participation in on-call rotation
• Engage with product teams to fix production outages and carry forward action items to improve ongoing reliability
• Develop effective tooling, alerts, and response to both identify and address reliability risks including automatic problem detection and mitigation
• Manage end-to-end availability and performance of Cloud Brokerage services and build automation to prevent problem recurrence. Eventually automate response to all non-exceptional service conditions.
• Design, write and deliver software to improve the availability, scalability, latency, and efficiency of Cloud Brokerage services.

As Cloud Brokerage Site Reliability Engineer you should possess the following skills:
• DevOps Mindset
• You enjoy solving difficult engineering problems and don’t mind getting your hands dirty
• Approach troubleshooting systematically and have a deep sense of ownership for whatever you work on
• Ability to root cause sources of instability in a high-traffic, distributed system
• Passion for resolving reliability issues and identify strategies to mitigate going forward
• Willingness to work in an ever-changing environment
• You are passionate about automation and innovations that improve productivity

Required Technical and Professional Expertise

• Experience with Cloud Computing platforms – IBM Cloud, AWS, Azure, Google Cloud Platform – 3+ years
• Strong Linux system-level analysis capabilities – 5+ years
• Experience in operating highly available distributed systems, in particular microservices, in a cloud environment – 1+ years
• Experience in at least one scripting language, Python preferred. – 2+ years
• Sound understanding of CI/CD systems as well as experience in running containerized applications using tools such as Docker and Kubernetes. – 1+ years
• Experience with configuration and troubleshooting of Linux, Java/Scala, Docker systems – 1+ years
• Experience in operating RDBMS and NoSQL databases. – 3+ years
• Experience in Java, Elasticsearch, Kibana, Logstash, Grafana  - 2+ years
• Understanding of large-scale complex systems from a reliability perspective – 5+ years

Preferred Technical and Professional Experience

• Proficiency in algorithms, data structures, complexity analysis and software design and expertise in Unix/Linux systems, IP networking, performance and application issues. - 5+ years

Eligibility Requirements

  • None