Site Reliability Engineer

Open Systems Technologies   •  

New York, NY

Industry: Technology


Less than 5 years

Posted 164 days ago

  by    Radhika Arora

This job is no longer available.

A global data based financial firm is seeking a Site Reliability Engineer to join their team in New York. In this role you will help build a large-scale distributed systems to develop mission-critical system infrastructure. You will be part of a team that builds the foundation to support a multi-cloud environment. 


  • Identify and automate developer workflows. Provide development teams self serve tools to provision infrastructure, deploy/manage applications and to manage their operational environments.
  • Implement industry-wide best practices around public and private cloud infrastructure. Adopt tools and technologies like Terraform, Kubernetes that help abstract underlying infrastructure.
  • Develop and maintain documentation, training and SLA for managedinfrastructure and systems to socialize and be agents of change
  • Work closely with development teams to evolve legacy systems with modern, Internet-scale design patterns. An example of this which the team is currently involved in is the move to Kubernetes for stateless services.


  • 3+ years of experience working on highly available, fault-tolerant distributed systems
  • A strong understanding of operating systems and the nuances of Linux
  • Experience with datacenter networktroubleshooting including IP fundamentals, DNS, load balancing, proxies and firewalls
  • Familiarity with configuration management systems such as Chef, Puppet or Ansible
  • Proficiency in at least one of the following languages: Python, Ruby, C/C++, Go or Java
  • A solid understanding of the modern software development lifecycle (SDLC) processes such as Continuous Integration and delivery
  • Expertise in analyzing and troubleshooting large-scale distributed systems
  • A deep understanding of web operations and cloud infrastructure (AWS, Azure, Google)
  • Knowledge of network and application performance analysis using standard UNIX tools
  • Experience with maintaining and managing a community around open source software