Site Reliability Engineer

Fidelity

$90K — $130K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • ~2 years of experience in SRE, software engineering, DevOps, or production engineering.
  • Strong coding skills in Node.js, JavaScript, TypeScript, and Python.
  • Experience building tools, APIs, or automations for operational challenges.
  • Familiarity and interest in AI-assisted development workflows, like GitHub Copilot.
  • Foundational knowledge of monitoring, logging, and distributed systems concepts.
  • Exposure to cloud platforms, CI/CD pipelines, and modern development practices.,

Responsibilities

  • Build software solutions to enhance reliability and scalability of production systems.
  • Develop production-quality code with thorough testing and documentation.
  • Utilize modern development tools and AI-assisted tools to streamline workflows.
  • Own features and improvements from design to deployment and validation.
  • Engage in on-call support and effective incident response management.
  • Analyze incidents to identify patterns and implement sustainable fixes.
  • Improve observability for services owned by implementing logging, metrics, and alerts.

Benefits

  • Participation in on-call rotations for incident response.
  • Opportunity to work with modern AI tools and practices.
  • Career progression opportunity toward a Grade 5 SRE role.
  • Encouragement of continuous learning and professional development.
  • Collaborative environment across engineering and operations teams.
Full Job Description

Job Description:

Site Reliability Engineer (SRE) – Grade 4

Production Support Engineering

We are looking for engineers who solve operational problems by building software. In this role, you will improve reliability, reduce toil, and enhance production systems by writing code, building automations, and leveraging modern AI-assisted development tools.

This is a hands-on engineering role—not a traditional support position. You’ll use Node.js/TypeScript, Python, and AI tools like GitHub Copilot to design and deliver solutions that make systems more reliable and operations more scalable.


What You’ll Do
  • Build software solutions to improve reliability, reduce operational toil, and scale production systems—not just respond to issues.
  • Develop production-quality code using Node.js / JavaScript / TypeScript and Python/PowerShell, including testing and documentation.
  • Leverage modern development tooling such as VS Code and AI-assisted tools (e.g., GitHub Copilot) to accelerate delivery and problem-solving.
  • Independently own well-scoped features, fixes, or improvements end-to-end—from design through deployment and operational validation.
  • Participate in on-call rotations, respond to incidents, execute runbooks, and ensure clear communication and handoffs.
  • Analyze incidents and recurring issues to identify patterns, reduce alert noise, and implement durable fixes.
  • Implement and improve observability (logging, metrics, dashboards, alerts) for owned services.
  • Build automations, scripts, and lightweight tools to eliminate repetitive manual work and improve operational efficiency.
  • Identify and act on opportunities to improve system reliability, performance, and maintainability.
  • Develop an understanding of how systems impact customer experience and business outcomes.

The Expertise and Skills You Bring
  • ~2 plus years of experience in SRE, software engineering, DevOps, or production engineering.
  • Strong hands-on coding skills with emphasis on:
    • Node.js / JavaScript / TypeScript
    • Python for scripting and automation
  • Experience building tools, APIs, or automations to solve engineering or operational problems.
  • Familiarity with AI-assisted development workflows (e.g., GitHub Copilot, code generation tools) and interest in applying AI/LLMs to improve engineering productivity.
  • Foundational knowledge of:
    • Monitoring, logging, and observability concepts
    • Distributed systems and API-based architectures
    • SQL and data analysis for troubleshooting
  • Exposure to cloud platforms (AWS or Azure), CI/CD pipelines, and modern development practices.
  • Basic understanding of incident management, problem management, and production support processes.

What We’re Looking for in You

  • A self-starter who proactively identifies key issues and trends, performs thoughtful analysis, and develops creative, high-impact solutions that deliver measurable value.
  • A builder mindset, you instinctively solve operational problems by writing code and creating automation.
  • A modern engineer who leverages AI tools to increase speed, quality, and effectiveness.
  • Strong problem-solving skills with the ability to troubleshoot, analyze, and deliver practical solutions.
  • Proactive approach to identifying risks, inefficiencies, and improvement opportunities.
  • Curiosity and desire to continuously learn systems, tools, and reliability engineering practices.
  • Clear communicator who collaborates effectively across engineering and operations teams.
  • Growing autonomy and consistency aligned with progression toward a Grade 5 SRE role.

Certifications:

Category:

Information Technology

Similar Jobs

More Jobs at Fidelity

More Information Technology Jobs

Find similar Site Reliability Engineer jobs: