Plaid's Infrastructure team builds the platforms and tooling that help engineering teams develop, deploy, and operate production systems safely. Release Engineering owns the path from merge to production, including Plaid's zero-touch deployment system, progressive rollouts, metric-gated analysis, and automatic rollback. Our goal is to make safe shipping the default for every product team.
As a Staff Site Reliability Engineer on Release Engineering, you'll define and scale Plaid's reliability practices across product engineering. You'll architect our SLO and error-budget programs, drive the adoption of progressive delivery, and ensure new products are production-ready. By partnering across product and platform teams, you'll translate complex production needs into intuitive, self-service tooling. This is a hands-on technical leadership role where you'll shape the future of our deployment systems-ensuring they remain fast and safe even as AI-assisted development increases code velocity.
What excites you- Lead the expansion of reliability standards across product engineering, converting foundational infrastructure into lasting operational habits and tooling.
- Architect and manage the SLO and error-budget framework, empowering teams to utilize reliability data for strategic product and release choices.
- Promote widespread use of progressive delivery and automated safety gates, ensuring high velocity without compromising production stability.
- Guide emerging product teams toward production readiness through expertise in observability, incident response, and scalable deployment health.
- Collaborate with SRE, Platform, and Infrastructure teams to transform complex production requirements into intuitive, self-service platform features.
- Direct the response to critical incidents and ensure the resulting post-mortem actions yield permanent improvements to the platform.
- Prepare for an AI-driven development landscape by scaling our safety nets to handle an increased volume and frequency of code changes.
What excites us- Over 8 years of professional experience in backend systems, SRE, or platform engineering roles.
- Proven track record of designing reliability programs-such as service maturity models or SLI frameworks-that achieved cross-team adoption.
- Direct experience building or operating canary rollout systems, metric-gated analysis, or automated rollback infrastructure.
- Technical proficiency in software development, with a preference for Go or similar systems languages.
- Ability to drive organizational change and influence engineering culture without formal authority.
- Sound technical judgment in high-stakes production scenarios, balancing user impact with developer velocity.
- Prior exposure to Kubernetes, service mesh technologies, Prometheus, or ArgoCD is considered a strong asset.
Our culture is rooted in impact and collective growth. We seek technical leaders who resonate with our principles of inventing tomorrow and embracing openness.
Our mission at Plaid is to unlock financial freedom for everyone. To support that mission, we seek to build a diverse team of driven individuals who care deeply about making the financial ecosystem more equitable. We recognize that strong qualifications can come from both prior work experiences and lived experiences. We encourage you to apply to a role even if your experience doesn't fully match the job description. We are always looking for team members that will bring something unique to Plaid!
Additional compensation in the form(s) of equity and/or commission are dependent on the position offered. Plaid provides a comprehensive benefit plan, including medical, dental, vision, and 401(k). Pay is based on factors such as (but not limited to) scope and responsibilities of the position, candidate's work experience and skillset, and location. Pay and benefits are subject to change at any time, consistent with the terms of any applicable compensation or benefit plans.