Site Reliability Engineer

Exact Sciences   •  

Madison, WI

Industry: Professional, Scientific & Technical Services


Less than 5 years

Posted 43 days ago

This job is no longer available.

Summary of Major Responsibilities

This position is focused on providing strategic direction on and execution of infrastructure, security, continuous integration, deployment, and IT operations practices, scaling and metrics, as well as running day-to-day operations of production and development infrastructure for a cloud based hosted platforms.

The Site Reliability Engineer will work with other Software Engineers, Database Engineers, and Product Managers to analyze system and network loads to address stability and performance challenges, and collaborate with others to operate various systems. The Site Reliability Engineer performs ongoing application support by diagnosing and resolving issues, maintaining applications, and evaluating and recommending options for improving performance, maintainability and operability. This also includes streamlining processes to increase system scalability and reliability, improve efficiency, and minimize errors.

Essential Duties and Responsibilities

  • Ability to work with and use Amazon Web Services or other Cloud technology platforms, leveraging CloudFormation, Terraform, Packer, Ansible, Python, Bash, Java, JavaScript, Linux
  • Understanding of security and encryption best practices
  • Responsible for designing, building, maintaining, and scaling production services and server farms across multiple data centers for complex and data-intensive cloud services
  • Design and enhance software architecture to improve scalability, service reliability, capacity, and performance
  • Write automation code for provisioning and operating infrastructure at massive scale. You are not an operator, you’re an experienced software engineer focused on operations
  • Work with development teams to make sure the applications fit nicely within the infrastructure and scalability/reliability is designed and implemented from the grounds up; You will work with QA on building pipelines and automation for delivering and deploying applications to production
  • On-call rotation supporting theinfrastructure
  • Roll up the sleeves to troubleshoot incidents, formulate theories and test your hypothesis, and narrow down possibilities to find the root cause
  • Participate in postmortem reviews and remediation recommendation
  • Identify bad trends before they become problems; respond to automated system alerts, effectively troubleshoot system errors and work incidents to return systems to normal operating conditions
  • Author and update high-quality documentation of all relevant specifications, systems and procedures
  • Other duties as assigned


Minimum Requirements

  • College diploma in CS/Engineering/Sciences or equivalent experience
  • 1+ years of experience with design capabilities using modern technologies
  • Track record in successfully addressing performance, scalability and latency challenges
  • Experience in developing systems architecture