These tools include test frameworks and infrastructure; logging, monitoring, and metrics collection; dashboarding and alerting; and coordination and deployment for distributed systems.
As a member of this group of versatile full stack engineers, your remit will include:
- Acting as the technology arm of Reliability Engineering – Two Sigma’s core DevOps organization;
- Developing or improving foundational technology used by Two Sigma’s engineering teams to build distributed services;
- Improving all aspects of software reliability, including better monitoring, alerting and documentation;
- Engaging with our software engineering teams on improvements to our tools, processes and software;
- Support of some core services used by Two Sigma’s engineering teams;
- A bachelor’s degree in computer science or another highly technical, scientific discipline.
- Ability to program (structured and OO) with one or more high level languages (such as Python, Java, C/C++).
- Familiarity with open source tools used for deployment, logging and monitoring (e.g. Ansible, Elastic Search, InfluxDB, Prometheus)
- Familiarity with resource management frameworks such as Mesos, Kubernetes and Yarn
- A proven track record of automation and an algorithmic approach to solving problems.
Additional skills preferred:
- In-depth knowledge and experience in at least one of: host based networking, linux/unix administration, systems programming, distributed systems, databases, cloud computing, and a desire to learn more.
- A proactive approach to spotting problems, areas for improvement, performance bottlenecks, etc.
- An understanding of the operational concerns in a demanding environment; ideally, but not necessarily, finance.
- The ability to understand the inherent trade-offs between various software architectures as it relates to performance, resiliency/fault tolerance, load balancing, data consistency.