Uniphore

Principal Site Reliability Engineer

Uniphore$232K — $335K *
Information Technology
8 - 10 years of experience
Job Overview by Ladders

Qualifications

  • 10+ years in DevOps/SRE/Platform Engineering with significant impact in staff or principal roles.
  • Proficient in writing production-level Go and managing Go services.
  • Extensive experience with Kubernetes, including building and maintaining Operators.
  • Expertise in cloud infrastructure management on AWS/GCP/Azure, including Terraform.
  • Demonstrated excellence in incident management and system design for production quality.
  • Strong software engineering fundamentals, particularly in API design and observability.
  • Effective technical writing skills for documentation and operational procedures.

Responsibilities

  • Define and implement long-term architectural strategies for multi-cloud platforms.
  • Lead hands-on projects to enhance infrastructure reliability, automation, and performance.
  • Develop multi-year technical roadmaps for scalability, reliability, and security.
  • Provide technical leadership through code contributions and design reviews.
  • Maintain stewardship of systems, ensuring alignment with the architectural vision.
  • Advise engineering leadership on technical strategy and influence product direction.
  • Participate in on-call duties, owning escalations and implementing architectural fixes.

Benefits

  • Competitive base pay and annual incentive opportunities.
  • Pre-IPO stock options to incentivize long-term growth.
  • Comprehensive medical, dental, and vision benefits.
  • 401(k) with employer matching to support retirement savings.
  • Generous paid time off and paid holidays for work-life balance.
  • Paid leave policies, including a day off for your birthday.
Full Job Description
Job Description:

What You'll Be a Part Of:
Uniphore is one of the largest B2B AI-native companies-decades-proven, built-for-scale and designed for the enterprise. The company drives business outcomes across multiple industry verticals and enables the largest global deployments. Uniphore infuses AI into every part of the enterprise that impacts the customer through our multimodal architecture combining Generative AI, Knowledge AI, Emotion AI, workflow automation and co-pilot guidance.

About the Role:
We're looking for a Principal Site Reliability Engineer to join our Platform Engineering team - someone equally at home writing production Go as designing and operating cloud infrastructure. The highest-leverage work here isn't a runbook; it's the service that enforces the runbook automatically. You'll write Go that runs in production and multiplies your impact across hundreds of services.

You'll build the standards, frameworks, automations, agentic workflows, and self-service capabilities that make engineering teams autonomous while maintaining enterprise-grade reliability and security. You won't just define standards - you'll implement them in code: a Kubernetes Operator that enforces service readiness criteria, a service that surfaces SLO health across the fleet, an internal platform service that automates task execution.

You'll collaborate with feature teams as an expert advisor and standard-setter, helping them build operational maturity while you maintain oversight of our single/multi-tenant, multi-cloud infrastructure. You'll be a bridge between software development and systems operations, focused on large-scale, resilient, automated infrastructure rather than daily firefighting.

This is a senior individual-contributor role. You will not have direct reports. Your leadership is technical - exercised through architecture, production code, design reviews, and mentorship. This role participates in our on-call rotation, which covers all production systems. As a Principal, you'll own the hardest escalations and use what you learn on-call to drive the architectural fixes that eliminate whole classes of incidents.

Responsibilities:

Invention:

  • Define and execute long-term architectural strategy for our multi-cloud platforms.


  • Lead hands-on implementation of critical infrastructure projects, focusing on reliability, automation, and performance at scale.


  • Own multi-year technical roadmaps that establish the vision for infrastructure scalability, reliability, security, and engineering velocity.


Own:

  • Provide technical leadership through design reviews and code contributions; set technical direction, eliminate architectural barriers to execution, and drive toward simplicity.


  • Maintain end-to-end technical stewardship of your systems, keeping execution aligned with architectural vision and best practices.


  • Act as a key technical advisor to Engineering Leadership and Product Management, influencing the strategic direction of Uniphore.


  • Lead design reviews across Infrastructure with a focus on automation, scalability, and reliability, and align architectural roadmaps across teams.


  • Partner with Security to build secure-by-default systems and remediate weaknesses.


  • Own the reliability of the systems under your technical stewardship.


  • Create the technical clarity - vision, standards, and tooling - that lets feature teams build, own and operate their services.


  • Participate in fleet-wide on-call, owning critical escalations across all production systems and converting recurring failure modes into permanent architectural fixes.


Teach:

  • Establish and evangelize design principles for reliable, secure, scalable systems.


  • Grow other engineers through technical mentorship, architectural guidance, and design review.


Requirements:

  • 10+ years in DevOps/SRE/Platform Engineering, with demonstrated Staff- or Principal-scope impact and a track record of transforming operational models.


  • Production Go: you write Go regularly, understand its concurrency model, and are comfortable owning Go services in production.


  • Kubernetes depth: operational expertise plus the ability to extend it - you understand the controller-runtime model and could write or maintain a Kubernetes Operator.


  • Cloud & infrastructure: expert-level AWS/GCP/Azure, Terraform, and multi-cloud architecture, with strong cost-optimization instincts.


  • Production excellence: deep incident management, RCA process, and on-call system design experience.


  • Software engineering fundamentals: API design, testing, observability instrumentation, and service lifecycle ownership - you treat internal tooling with the same rigor as customer-facing software.


  • Standards & documentation: strong technical writing; you create operational procedures teams can self-execute.


  • Architecture & planning: RFC/PRD review experience; you catch operational problems at design time.


  • Collaboration & coaching: you build team capability through tooling and knowledge transfer rather than doing the work for them.


Nice to Haves:

  • Building Kubernetes Operators, controllers, or admission webhooks (controller-runtime, kubebuilder).


  • Contributions to open-source infrastructure tooling.


  • AWS Solutions Architect Professional or equivalent GCP/Azure certifications.


  • Kubernetes certifications (CKA, CKAD, CKS).


  • Platform engineering, developer experience, or internal developer portals (Backstage, etc.).


  • GitOps patterns (ArgoCD, Flux) and policy-as-code tooling (OPA, Kyverno).


Why You'll Love This Role:

  • Your code is your leverage. Solutions you ship multiply across dozens of services and teams - you prevent entire classes of problems rather than patching instances.


  • You'll shape the platform strategy. You'll drive the transformation from reactive support to strategic platform partnership, with platform engineering embedded in planning to prevent downstream issues.


  • You'll tackle the hardest problems. Multi-tenant architecture scaling, cross-service observability, and reliability challenges that affect our largest enterprise deployments.


  • You'll set the bar. Define the standards, incident-management frameworks, and service-ownership model that let teams graduate to full operational independence.


Hiring Range: $232,900 - $335,811 OTE - for Primary Location Palo Alto, CA

Benefits:

In addition to competitive base pay, this position also includes an annual incentive opportunity based on target achievement, pre-IPO stock options, benefits including medical, dental, vision, 401(k) with a match, and more, plus generous paid time off, paid holidays, paid day off for your birthday and other paid leave policies to support employees through all phases of life.

Location preference:
USA - CA - Palo Alto

About Uniphore

Uniphore is a global provider of conversational AI solutions. The company was founded in 2008 and has offices in India, the United States, and Singapore. Uniphore's solutions use natural language processing and machine learning to enable businesses to automate customer service and support operations. The company's clients include some of the world's largest companies in industries such as banking, insurance, and telecommunications. Uniphore has won numerous awards for its innovative solutions and has been recognized as a leader in the conversational AI market.
Learn more about Uniphore
Size
500 employees
Industry
Founded
2008

Similar Jobs

More Jobs at Uniphore

More Information Technology Jobs

Find similar Principal Site Reliability Engineer jobs: