Reliability Engineering Lead

Equifax   •  

Alpharetta, GA

8 - 10 years

Posted 239 days ago

This job is no longer available.

We are seeking a leader for our Reliability Engineering team. The ideal candidate will have advanced experience of Enterprise and Application Performance monitoring, as well as a demonstrable track record of effective Capacity monitoring. Experience of and a passion for automation and constant improvement are also desirable. You will work directly with product development, application and platform support teams and report directly to the Vice President.


Major - Delivery and Execution

  • Leads a team of Monitoring SMEs to achieve the following goals:

  • Collaborates with other teams to develop secure, reliable, efficient and scalable software services.

  • Works with Architecture, Development and Systems Engineering teams to develop innovative solutions to attain high availability, scalability, and reliability

  • Works with internal and external teams to develop automation for tool configuration and functional certification.

  • Maximizes product reliability by developing implementations of commercial monitoring software to align with rapidly evolving business needs.

  • Creates effective dashboards, reporting, alerting and responses to ensure that impact from issues is either avoided or rapidly resolved.

    Medium - Support and Collaboration

  • Develops team to act as thought leaders in the enterprise.

  • Monitors effectiveness of implementation and plans for constant improvement in support of the goals of the organization.

  • Provides first line application support for automation and tools.

  • Proactively reviews system performance and capacity and aligns with customer roadmap to plan each release accordingly.

  • Familiar with ITIL framework around change, incident, problem management

    Minor - Learning

  • Proactively identifies learning opportunities for developing industry best practices and tools usage.

  • Proactively seeks out knowledge on new technologies and techniques and how they are benefitting other organizations

    Preferred Skills and Experience:

    Years of Relevant Work Experience: 8-10 years

  • Proficient in production performance monitoring concepts and implementation.

  • Experience with specific or similar tools and technologies: AppDynamics, DataDog, Apica, Elastic Stack, Linux, Java, Oracle.

  • Understanding of production systems design concepts including Reliability, Security, High Availability and Disaster Recovery.

  • Understanding of infrastructure automation tools (e.g. Chef) and associated concepts.

  • Experience working in Application SaaS delivery channel with micro service based architectures.

  • Bachelor's degree in Information Systems, Computer Science or related field

  • Operations exposure: deployment configuration, sustainability, scaling patterns, load balancing, performance tuning, SLA management, integration with enterprise systems

  • Creativity in problem solving and analysis, particularly in resolving application technical issues

  • Excellent verbal and written communication skills

  • Experience working in public cloud

    Preferred Skills

  • Experience in managing compliance with PCI Data Security Standards

  • Experience with web-based application development and industry trends

Job Number:J00058619