Site Reliability Engineer in Virtual / Travel

$80K - $100K(Ladders Estimates)

Mesosphere   •  

Virtual / Travel

Industry: Enterprise Technology


Less than 5 years

Posted 53 days ago

We don't mind getting into the weeds with hard to diagnose networking issues, and we troubleshoot such problems by leveraging our years of frontline experience firefighting within large scale web operations. Some of us have experience with Mesos before coming on board at D2iQ, and some of us don't. However, having a strong understanding of distributed systems and systems engineering is key to our success. We've been solving Site Reliability Engineering problems through code before SRE or DevOps became a term. We take pride in creating software which people rely on and is a joy to use.


  • Architect, build, and maintain systems that our engineering team and customers rely on
  • Contribute to documentation for both our customers and other engineers
  • Make DC/OS the easiest operating system to deploy, manage, and monitor at scale
  • Responsible for third party services and production infrastructure in which DC/OS is operating on
  • Partner with other engineers to design, build, and maintain critical systems
  • Consistently work to make our software simpler
  • Effectively estimate time to implement designs
  • Challenge yourself and your peers to always improve

Basic Qualifications

  • Expert level knowledge in at least one high level programming language such as Python or Go
  • 3+ years experience with production infrastructure
  • Designed and operated large scale infrastructure running on AWS, GCP, Azure or other cloud providers
  • Able to debug, troubleshoot, and resolve complex technical issues reported by customers
  • Background in system administration, operations or site reliability
  • Understanding of network protocols and networking in general
  • Deep knowledge of Linux fundamentals

Preferred Qualifications

  • Production experience with service oriented architectures and distributed systems like Mesos, Kafka, Cassandra, Hadoop, Zookeeper, etc.
  • An extremely clear, concise, and effective communicator
  • Worked with container systems like Docker or Rkt in production
  • Strong sense of ownership, urgency, and drive
  • Self-driven and motivated, with a strong work ethic and a passion for problem solving

Valid Through: 2019-10-18