We need someone who is passionate about automation, infrastructure as a code and configuration as a code, can develop and deploy software that will help drive improvements towards the availability, management, and visibility of Wavefront's services. In this role, you will take part in the on-call rotation for running the wavefront services and drive improvements to continuously increase the signal-to-noise ratio. Additionally,you will contribute to the development of tools for metrics gathering, introspection, monitoring, automated remediation and orchestration
Additionally, we are looking for someone who is willing to obtain security clearance.
Success in the Role: What are the performance goals over the first 6-12 months you will work toward completing?
- You will demonstrate a commitment to reducing Mean Time to Resolution (MTTR), solving each technical issue with the goal of taking steps to ensure it doesn’t happen again.
- You will drive continuous improvements in our products by providing opinionated input in feature workstreams.
- You will drive assigned projects to completion, being clear when tradeoffs are needed and deadlines need to be adjusted to accommodate higher-priority work.
- You will demonstrate knowledge of cloud architecture security, scaling, and management principles and have experience working with AWS, GCE or Azure cloud infrastructures.
- You will help drive deploying and maintaining production services using container technology such as Kubernetes or EKS or ECS.
What type of work will you be doing? What assignments, requirements, or skills will you be performing on a regular basis?
- Act as a leader on the team through mentoring others, working collaboratively with the Wavefront product engineering team, as well as strong scoping and project execution.
- Passionate about learning new technologies and adopting the right tools to manage these services in production, keeping SLAs and MTTR in mind at all times.
- Understand wavefront architecture, discover failure points and work with other teams to design tools/alerts to prevent issues in the future.
- Drive reliability improvements within the product by providing feedback to the product management team, influenced by a commitment to using the wavefront service for monitoring production environment and act as customer zero.
- Identify, scope and build tools to reduce the operational load on engineers.