Query Language - Reliability Engineer
A prominent, data based global technology firm is currently seeking a Query Language - Reliability Engineer to join their team in New York. This tech company has multiple focuses, the main one being the creation of one of the world's most widely utilized financial software applications. In this role you will work in a small, dynamic team with some of New York City's top engineers, to solve very unique, large scale challenges. Their systems are very large and highly distributed, and engineers are always looking for creative solutions to solve problems, including employing a variety of modern programming languages, open source and big data technologies, as well as Machine Learning and Natural Language Processing.
Not only can you grow professionally in this company but they have an excellent engineering culture, work/life balance, and great benefits. The company is also very philanthropic; many employees give back to the community and the company also donates a significant portion of their profits to philanthropy. This is an exciting opportunity for anyone looking for the next step in their career and for a work place they can call home.
This candidate will join a team that is pushing the envelope to lead the low-latency analytics space in the financial domain. This team is developing a cloud-based, low-latency Analytics and Screening Platform for huge financial data sets. They are also creating a Financial Query Language to allow users to express complex data retrieval, analytics and screening for processing.
You will ensure that the production systems are healthy, monitored, automated, and designed to scale. We'll depend on you to optimize the overall reliability of clusters through stress tests, chaos engineering, failovers and auto-recovery. You'll develop tools focusing on continuous integration, automated software releases, configuration management and system management.
- Own, manage, monitor and optimize the reliability and overall health of our development and production environments
- Work closely with development teams to define standards and ensure that applications are designed with scale, resilience, and performance in mind
- Streamline software development with continuous integration, deployment automation and agile configuration management
- Build tools to reduce toil and increase insight into trouble spots
- Implement effective governance controls in our development lifecycle
- Manage resiliency design & planning, collection and analysis of availability metrics
- Monitor current capacity, conduct regular capacity testing and predict future capacity needs
- 3+ years of experience in a relevant role (DevOps, Reliability Engineering, Software Development)
- Strong knowledge of UNIX or Linux systems running distributed applications platforms
- Demonstrated experience managing performance, availability and scale of mid- to large-sized systems
- Hands on experience in production deployment and release management
- Hands-on experience of Java programming language
- JVM tuning and profiling
- Containerization technologies (like Docker, Kubernetes, Mesos)
- Configuration management tools (like Chef, Puppet, Ansible)
- Continuous integration and deployment tools (like Jenkins, Bamboo, SonarQube)
- Distributed caches (like Redis)
- Distributed Computing knowledge