Site Reliability Engineering Lead, Specialist

Vanguard Group, Inc.

$120K — $150K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • Expertise in JavaScript (both server-side and client-side) or Java programming languages.
  • Working knowledge of Python or a similar scripting language.
  • Strong understanding of resiliency engineering techniques for platforms and applications.
  • Proven experience troubleshooting complex production issues and implementing effective mitigations.
  • Hands-on experience with AWS services and managing cloud infrastructure.
  • Familiarity with OpenTelemetry specifications and APIs.
  • Practical experience developing and operating software in distributed systems.

Responsibilities

  • Improve resiliency engineering practices across platforms and applications.
  • Detect, troubleshoot, and resolve incidents effectively.
  • Automate incident response and infrastructure management processes.
  • Develop and support OpenTelemetry integrations across various platforms and languages.
  • Contribute to architectural decisions and assist in implementing solutions.

Benefits

  • Collaborative and innovative working environment.
  • Opportunity to solve impactful operational problems.
  • Engagement in advanced resiliency engineering practices.
  • Hands-on experience with cutting-edge technology and cloud infrastructure.
  • Focus on personal development and continuous learning opportunities.
Full Job Description
We are seeking an experienced engineer with broad, end-to-end software development experience, including operating applications in a microservices environment in production at scale. This role goes beyond feature implementation - it requires someone who can design, build, and support resilient systems from the ground up.

As a Senior Reliability Engineer at Vanguard, you will play a critical role in solving impactful operational problems. You are curious and take a proactive approach to identifying problems and making improvements. You balance innovative thinking with pragmatism and understand the long-term impacts of technical decisions. You communicate complex ideas clearly and collaborate effectively to deliver scalable solutions.

Core Responsibilities
  • Improve resiliency engineering practices across platforms and applications, including resilient application design patterns, system observability and deployment strategies
  • Incident detection, troubleshooting, and resolution.
  • Develop automation for incident response and infrastructure management
  • Develop and support OpenTelemetry integrations for multiple application platforms (browser, ECS, lambda, etc) and languages (JavaScript, Java)
  • Contribute to architectural decisions and support implementation of solutions.


Skills and Qualifications
  • Expertise in JavaScript (server-side and client-side execution environments) or Java.
  • Working knowledge of Python (or similar scripting language)
  • Strong knowledge of resiliency engineering techniques for both platforms and applications.
  • Experience troubleshooting complex production issues and implementing effective mitigations.
  • Hands-on experience with AWS services and cloud infrastructure.
  • Familiarity with OpenTelemetry specification and core APIs.
  • Practical experience developing and operating software in distributed systems environments.

Special Factors

Sponsorship
Vanguard is not offering visa sponsorship for this position.

Similar Jobs

More Jobs at Vanguard Group, Inc.

More Information Technology Jobs

Find similar Site Reliability Engineering Lead, Specialist jobs: