Site Reliability Engineer, NOC in Seattle, WA

$100K - $150K(Ladders Estimates)

Nordstrom   •  

Seattle, WA 98160

Industry: Retail & Consumer Goods

  •  

5 - 7 years

Posted 56 days ago

Job Description

Nordstrom is hiring in our Site Operations Center as part of our Site Reliability Engineering group. We're obsessed with driving down our production issue count, ensuring that we learn from what issues we have, and strive to lower the time to repair any issues that occur.

If you want to be a part of a team of engineers that monitors and troubleshoots Nordstrom's infrastructure 24x7 and is the Eyes-on-Glass first point of contact for all issues, and will work closely with our Incident Response and Site Reliability teams, and thrive in an intense, fast paced, highly visible environment then we should talk.

A day in the life...

Providing Tier 1 support for application and infrastructure issues across the enterprise

Monitoring, triaging, and coordinating incident response when service failures, infrastructure issues, or deployment issues occur

Hands on analysis and troubleshooting of production

Identifying, defining, and building improvements to support tools, processes, and the service itself

Improve customer experience with delivering new service monitoring, alarming and scripting

You own this if you have...

Familiarity with site and infrastructure monitoring systems (like AWS Cloudwatch, Datadog)

UNIX/LINUX sysOps tasks, including expertise in administration, monitoring, troubleshooting, performance tuning, preventative maintenance and capacity planning.

Networking (TCP/IP, routing, network topologies and hardware, SDN, etc).

Broad understanding of large scale system architecture, automation, integration, and processes

Ability to debug and optimize code and to automate routine tasks.

Ability to work night/weekend shifts

4+ year of work experience with production Linux systems administration

2+ years with configuration management, source control and containerization tools

2+ year of work experience managing Cloud based infrastructure and automation

2+ year of experience with at least one scripting language ( eg Bash, Python, Ruby, Go )

Motivated, critical thinker with proven skills to troubleshoot and solve problems in a production support environment

Ability to successfully manage competing priorities in critical incident situations

Strong desire to learn and understand new technologies

Excellent verbal and written communication skills

Experience working with ITIL and Service Management best practices is a plus.

Education:

Bachelor's Degree or equivalent experience.

We've got you covered…

Our employees are our most important asset and that's reflected in our benefits. Nordstrom is proud to offer a variety of benefits to support employees and their families, including:


  • Health
  • Retirement
  • Time Off
  • Merchandise Discount
  • Lifework / EAP resources


Valid Through: 2019-10-20