Site Reliability Engineer

Qumulo   •  

Seattle, WA

5 - 7 years

Posted 275 days ago

This job is no longer available.

About the position:

As a Qumulo Member of Technical Staff in our Engineering Tools & Infrastructure team, you will architect, develop and maintain the production services for monitoring Qumulo clusters deployed in our customers’ data centers and the systems used to support our product development work.

We are looking for someone with strong analytical and troubleshooting skills, fluency in coding and systems design, solid communication skills and a desire to tackle the complex problems of scaling a young organization's infrastructure to maturity.

About the company:

Qumulo is a Seattle based data storage startup. We are building solutions that will permanently change the enterprise storage marketplace, improving quality and increasing service standards. We are dedicated to building not only to a fast, reliable product, but providing customers with a seamless user experience and unprecedented visibility on their data.

Founded in 2012 by the inventors of scale-out NAS, our vision has attracted a team of pioneers from Amazon Web Services, Google, and Microsoft. Our mission is simple – to be the company the world trusts to store, manage and curate its data.


  • Production operation responsibilities for keeping our customer monitoring service operational and responsive as our customer base grows
  • Production operation responsibilities for keeping the engineering build and continuous integration (CI) systems operational
  • Participate in capacity planning for our development and test infrastructure
  • Set standards for how to integrate with the build and CI systems
  • Educate and support Agile teams in integrating with CI systems


  • BS or MSdegree in Computer Science or related technical field (or equivalent practical experience)
  • 4+ years of relevant work experience
  • Strong Python and/or Ruby development
  • Experience managing development and test environments
  • Excellent troubleshooting and debugging skills
  • Deep experience with virtualization platforms, monitoring tools, and networking
  • Ability to handle periodic on-call duty as well as out-of-band requests
  • Experience with Puppet, Chef, Ansible, and/or Salt
  • Expertise in analyzing and troubleshooting large-scale distributed systems
  • Knowledge of IP networking, network analysis and performance and application issues using standard tools such as tcpdump

Qumulo is an Equal Opportunity Employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, age, disability, military status, or national origin or any other characteristic protected under federal, state, or applicable local law.