A leading global investment firm, managing a wide range of investment funds worldwide across multiple asset classes is seeking a Cloud Reliability Engineer to join their team in New York. This firm encourages an entrepreneurial spirit and for people to strive for success. You will be empowered by top leaders in the industry to do your best work and propel your career forward.
The successful candidate will be an engineer with development skills and deep technical expertise on public cloud platforms. The role will be part of a cloud engineering team that is developing frameworks and tooling for automating and managing the deployment of applications in the cloud. The candidate will be focused on ensuring the reliability and resiliency of our cloud offerings. This position offers the opportunity to deliver significant value and drive technical innovation.
You must have deep experience managing and automating the lifecycle and operations of cloud infrastructure on AWS or Google using native tools, open source tools, and third party products. He/She will also have experience developing production ready code, in one or more languages, that must include Python. You should also be familiar with developing unit and functional tests, and have experience with continuous integration as it applies to infrastructure as code.
The candidate should have experience architecting infrastructure to ensure the availability and resiliency of services and data. The candidate must be experienced with managing, persisting, and replicating data in different formats in the cloud including databases, file systems, block stores, object stores, and machine images and containers. The candidate should be comfortable with Linux systems and containers as well as automating configuration management.
- Designing and building resiliency as default into our cloud based architecture
- Design and automate and tests that ensure the reliability of cloud deployed applications
- Design and automate deployment mechanisms such as Blue/Green and Canary
- Automating systems configuration and orchestration using tools, such as Chef, Ansible, or Salt
- Automating creation of machine images and containers
- Designing CI/CD pipelines to include infrastructure, application, and security testing, and gates
- Implementation of availability, security, and performance monitoring and alerting
- Implement load testing and capacity planning
- Automating data resiliency and replication based on policies
- Significant experience designing and supporting production cloud environments
- Strong coding skills, in one or more languages, to include python
- Experienced developing collaboratively, including infrastructure as code
- Experience developing automated tests, preferably in python, to validate application and infrastructure functionality, security, and performance as part of an SDLC process
- Cloud templating and automation tools for deploying and managing infrastructure
- Experience building CI/CD pipelines including the use of cloud native tools
- Experience with data management and protection strategies in the cloud
- Experience with key management as it pertains to data in cloud environments
- Monitoring applications using cloud native, open source, and 3rd party tools
- Deep knowledge of cloud platform APIs and automation
- Degreepreferred in a STEM or related field