San Francisco, CA / RemoteAs a Staff Engineer on the Order Management System (OMS) team, you will be an important technical voice. You will shape how we build, operate, and evolve a mission-critical platform. This platform powers commerce at Levi Strauss & Co. You will bring deep software engineering fundamentals to bear on hard problems-designing systems built for scale and untangling complexity so our platform can move faster with confidence. You operate at the intersection of engineering craft and real-world production ownership: you build it, you run it, you make it better.
About the JobSystem Design & Architecture
- Lead the design and domain modeling of complex, distributed systems within the OMS ecosystem. This produces clear, well-reasoned service boundaries, data contracts, and event-driven interaction patterns that stand up to scrutiny and scale.
- Champion domain-driven design (DDD) principles, working with product and engineering peers to identify bounded contexts, eliminate implicit coupling, and surface shared language across teams.
- Guide decomposition of monolithic or tightly-coupled components into well-defined, independently deployable services-reducing blast radius, improving team autonomy, and promoting faster iteration.
- Author architecture decision records (ADRs) and technical design documents that communicate the "why" alongside the "what," helping teams make decisions over time.
Engineering Excellence
- Write, review, and guide production-quality code with an emphasis on clarity, testability, and long-term maintainability-setting the bar for engineering craft on the team.
- Apply modern software engineering practices: CI/CD pipelines, automated testing strategies, feature flagging, progressive delivery, and trunk-based development.
- Identify and eliminate technical debt systematically, balancing short-term velocity with long-term system health through well-argued, incremental improvement plans.
- Establish and promote coding standards, patterns, and best practices across the OMS team that are practical, enforceable, and grounded in production experience.
You Build It, You Run It- Operate with full production: you design with failure in mind, participate in on-call rotations, and take accountability for the health and reliability of the systems you ship.
- Embed reliability engineering into the development lifecycle-defining SLOs, error budgets, and reliability targets upfront rather than as an afterthought.
- Treat runbooks, strategies, and operational documentation as first-class engineering artifacts, keeping them accurate, applicable, and tightly coupled to the systems they describe.
Observability & Operational Intelligence
- Design and implement comprehensive observability strategies-structured logging, distributed tracing, and metrics-so that you can localize any failure mode in production.
- Develop dashboards that give engineers, on-call responders, and partners genuine operational insight into system health-not just uptime pings, but meaningful golden signals and business-relevant Goals.
- Define and tune alerting strategies that are signal-rich and noise-poor-ensuring you wake on-call engineers for relevant events, not symptoms of unrelated upstream noise.
- Champion observability as a design constraint, ensuring you instrument new services and that you make telemetry quality part of every code review and launch checklist.
Scale & Performance
- Design systems that can sustain peak commercial volumes-seasonal traffic spikes, flash sales, and global expansion-without degraded experience or unplanned downtime.
- Apply scalability patterns: asynchronous messaging, event sourcing, CQRS, caching strategies, database sharding, and graceful degradation, selecting the right tool for each problem.
- Conduct and lead capacity planning exercises, load testing, and performance profiling-translating production data into informed infrastructure and architectural decisions.
Troubleshoot & Incident Response
- Be the senior technical resource during complex production incidents-methodically narrowing hypotheses, leading war rooms, and restoring service while preserving forensic evidence for root cause analysis.
- Facilitate blameless post-incident reviews (PIRs) that produce durable improvements-not just immediate fixes, but systemic changes that reduce the likelihood or impact of recurrence.
- Develop institutional troubleshooting knowledge: document failure modes, known issues, and diagnostic techniques so the entire team grows more capable with each incident.
Collaboration & Mentorship
- Partner with product managers, architects, and other engineers to translate our requirements into clear, achievable technical roadmaps-bridging the gap between strategy and implementation.
- Mentor and level up mid-level engineers through hands-on code review, design feedback, pairing sessions, and direct coaching-building engineering depth across the OMS team.
- Stay current with industry trends in distributed systems, event-driven architecture, and operational tooling-bringing informed perspectives on when to adopt new approaches versus doubling down on patterns.
About You- 10+ years of experience in software engineering with a focus on backend systems, distributed architectures, and platform/product engineering at scale.
- Deep, practical experience designing and modeling complex distributed systems-you articulate trade-offs and make well-reasoned architectural choices under constraints.
- You have experience operating in a "you build it, you run it" engineering culture. You've been on-call for systems you've built, responded to incidents, and used that experience to make better engineering decisions.
- Build for scale and run at scale-you've handled high-throughput, high-availability systems and have the scars and lessons to show for it.
- Expert-level understanding of observability: you can instrument a system from scratch, build meaningful dashboards, tune alerting, and use telemetry data as a primary tool for engineering decisions.
- Troubleshoot with a systematic, data-driven approach to diagnosing production issues-you stay calm and lead others when systems are on fire.
- Demonstrated experience decoupling tightly-coupled systems-whether migrating a monolith, extracting a shared service, or replacing implicit temporal dependencies with well-defined async contracts.
- Experience with event-driven architecture, domain-driven design, and modern API design patterns; you know where these patterns add value and where they add unnecessary complexity.
- Experience with Order Management Systems (OMS), fulfillment pipelines, or commerce platforms is a meaningful plus-familiarity with the domain accelerates your impact, but is not a prerequisite for the right engineer.
- Mastery of CI/CD, automated testing, and DevOps practices; you view them as engineering fundamentals, not optional add-ons.
- You can translate technical complexity for non-technical partners and write for engineering audiences-design docs, ADRs, incident reports, and code reviews all reflect your thinking.
- Experience working with geographically distributed teams and navigating the complexities of multi-time zone collaboration.
This is a hybrid work schedule based in our San Francisco, CA headquarters. We expect you in office 3 days per week, typically Tuesday-Thursday. Note, time in office can vary depending on our needs.
The expected starting salary range for this role is $140,000 - $200,000 per year We may pay more or less than the posted range based on the location of the role. We will base the amount an employee will earn within the salary range on factors such as relevant education, qualifications, performance and our needs.
Levi Strauss & Co. (LS&Co.) offers a total rewards package that includes base pay, incentive plans, 401(k) matching, paid leave, health insurance, product discounts, and more designed to help you and your family stay healthy, meet your financial goals, and balance the demands of your work and personal life. Available benefits and incentive compensation vary depending upon the specifics of the role; details relating to a specific role will be made available upon request. Read more about our benefits here.
#LI-Hybrid
LOCATIONSan Francisco, CA, USA
FULL TIME/PART TIMEFull time
FILL DATEThis position is expected to be filled by 09/06/2026.
Current LS&Co Employees, apply via your Workday account.