Lead Software Engineer - Reliability

Nubank

$120K — $150K *
Miami, FL 33186In-Person
Enterprise Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 5-7 years of experience in production environments with a focus on system reliability
  • Proficient in defining and managing SLOs and SLIs with practical application in product decisions
  • Demonstrated capability in incident response management and conducting effective postmortems
  • Hands-on experience with observability tools for diagnosing production issues
  • Deep understanding of system design principles in distributed services and API architecture
  • Familiarity with cloud environments and CI/CD pipelines
  • History of providing technical leadership and mentorship in high-ownership settings

Responsibilities

  • Define SLOs and manage error budgets with product and engineering teams
  • Build and maintain observability layers for early issue detection and resolution
  • Lead incident response and drive improvements from blameless postmortems
  • Reduce operational toil through engineering solutions and automation
  • Perform production hardening to ensure system resilience under various stressors
  • Facilitate safe and rapid software changes using progressive delivery techniques
  • Enhance developer experience by addressing operational friction points
  • Collaborate with cross-functional teams to translate risk and reliability into actionable insights

Benefits

  • Opportunity to earn equity in Nubank
  • Comprehensive medical, dental, and vision insurance
  • Life insurance and AD&D coverage
  • Generous maternity and paternity leave policies
  • Access to Nucleo, the learning platform for professional development
  • NuLanguage, a language learning initiative
  • NuCare, a mental health and wellness assistance program
  • 401K retirement plan
  • Health Saving Account and Flexible Spending Account options
  • Work-from-home allowance and relocation assistance if applicable
Full Job Description
About the role

The U.S. Market team is launching a differentiated financial product in the largest and most demanding financial market in the world. We're iterating quickly on real customer signals while building systems that will eventually serve customers at Nubank scale. That combination - early-stage velocity, regulatory weight, and high reliability expectations, requires an engineer whose primary mandate is reliability, scale, and operational excellence.

This role exists to make sure the systems we're building today can be trusted in production tomorrow, and to set the bar for what "production-ready" means on this team. The engineer in this role delivers their mandate by writing production code, shaping architecture, and engineering the systems themselves - not by absorbing operational load.
You'll be responsible for

Define and operate against SLOs. Establish meaningful SLIs and SLOs with product and engineering partners, manage error budgets, and use them as real inputs to prioritization rather than dashboards no one reads.
  • Build the observability layer. Improve metrics, logs, traces, and alerting so issues are detected early, attributed precisely, and debugged with code-level confidence. Push instrumentation upstream into the services we own.
  • Lead incident response. Act as incident commander when needed, drive blameless postmortems, and turn findings into concrete engineering work that lands. Build the muscle in the team so this isn't centralized in any one person.
  • Reduce toil through engineering. Identify repetitive operational work and eliminate it with software - automation, self-healing behavior, better defaults, better tooling - rather than absorbing it as ongoing overhead.
  • Production Hardening. Stress-test designs for partial failure, dependency degradation, traffic spikes, and adversarial inputs. Run capacity and performance work before incidents arise. Ensure resiliency primitives are tuned and working correctly.
  • Make change safe and fast. Improve release safety through progressive delivery, feature flags, canaries, rollbacks, and tested migrations. Help the squad ship faster and with lower blast radius.
  • Improve developer experience especially where it removes operational friction or improves change safety. Where internal tooling or platform gaps slow the team down, build or contribute the fix. Prefer leverage over heroics.
  • Partner across disciplines. Work closely with product, platform, security, compliance, and other engineering teams. Translate reliability and risk tradeoffs into language each audience can act on.
  • Raise the engineering bar. Mentor engineers, review hard designs and PRs, and shape technical standards across the squad. Lead through clarity and judgment, not authority.
We are looking for a person who has

Track record of owning services in production - not just shipping them, but being the engineer responsible for how they behave under real load and real failure.
  • Experience defining and operating against SLOs/SLIs, and using error budgets to influence engineering and product decisions.
  • Experience leading incident response and writing postmortems that produced durable improvements.
  • Hands-on experience with observability tooling (metrics, structured logging, distributed tracing) and using it to diagnose nontrivial production issues.
  • Deep system design experience: distributed services, asynchronous messaging, storage tradeoffs, API design, idempotency, consistency, backpressure, and graceful degradation.
  • Significant industry experience building and operating production software systems in a high-ownership engineering environment.
  • Comfort operating in modern cloud environments (e.g., AWS/GCP), containerized workloads, and CI/CD pipelines, and reasoning about their failure modes.
  • Demonstrated technical leadership: influencing architecture across teams, mentoring strong engineers, and making the people around you better.
  • Pragmatism. You can hold a high reliability bar while still helping a fast-moving squad ship.

Location for this opportunity (City, Country)
  • Miami, United States
Our Benefits
  • Opportunity of earning equity at Nu
  • Medical Insurance
  • Dental and Vision Insurance
  • Life Insurance and AD&D
  • Extended maternity and paternity leaves
  • Nucleo - Our learning platform of courses
  • NuLanguage - Our language learning program
  • NuCare - Our mental health and wellness assistance program
  • 401K
  • Saving Plans - Health Saving Account and Flexible Spending Account
  • Work-from-home Allowance
  • Relocation Assistance Package, if applicable.


Work Model for this Role

Hybrid 2-3 times/week: Our hybrid work model brings us to the office at least twice a week, on strategic days designed to maximize team connection and collaboration. For more details, visit https://building.nubank.com/nu-hybrid-work-model/

Explore how we build technology at Nubank:

building.nubank.com.br ↗

youtube.com/@building.nubank ↗

Listen to our stories on Spotify ↗

Similar Jobs

More Jobs at Nubank

More Enterprise Technology Jobs

Find similar Lead Software Engineer - Reliability jobs: