The Apprenda Managed Services team is responsible for the end-to-end customer experience across the Apprenda Cloud Platform and Cloud Native Container Services. As a member of this team you will be the first line of interaction for a variety of customers: helping them to architect complex cloud solutions and utilize Apprenda's services as well as those of our partners.
As a Customer Reliability Engineer (CRE), you will be responsible for the successful delivery of a managed offering consisting of the design, build-out, fine-tuning and operation of a platform and ecosystem to sustain it. This means creating a close relationship with our customers, proactively identifying their use-cases and monitoring for potential flaws, providing support for incidents, as well as working with the Client CTO team to prioritize and build automation around any recurrent issue experienced.
Given the fast-paced ecosystem and community around Cloud Computing, CRE’s will use a variety of configuration tooling, target infrastructure and monitoring products to name a few. The ideal candidate is someone interested in continuous learning, and will be empowered to tinker and prototype as a part of their daily routine.
Candidates excelling in this position will be accelerated towards a career path as a Technical Account Manager within a 12-24 month timeframe.
- · Provide support of the Infrastructure and Tooling responsible for operating a managed Platform.
- · Interface with regional Technical Account Managers to provide support for the Managed Platform
- · Operate a Shared Pager Duty across a team of CRE's for SEV-1 Escalations.
- · Monitor Support HelpDesk & Slack Channels for incoming requests from clients.
- · Build and Maintain KB of known issues across customer implementations.
- · Automate build out (infrastructure, tooling and platform), monitoring (load testing, performance, log aggregation, etc.) and configuration that could be use cross-client accounts.
- · Liaise with the CCTO Team to define and build standards based on market trends including best practices, operating manuals, hardening guides, threat prevention/detection and recovery of services.
- · Basic Development Experience (with a focus on scripting against REST API's)
- · Automation Experience (Powershell, Scheduled Tasks, Cron Jobs)
- · Infrastructure Automation Experience (Ansible, Chef, Terraform)
- · Experience with Cloud Providers such as AWS, Azure, GCP, Softlayer, or other Distributed Infrastructure Platforms
- · Creative problem-solving, debugging and troubleshooting skills.
- · Ability to operate in a fast-pace environment.
- · Ability to work with a distributed team.