Smartsheet

Senior Manager, Engineering - Observability Platform (Remote Eligible)

Smartsheet$205K — $275K *
US-AnywhereRemote in Bellevue, WA
Information Technology
8 - 10 years of experience
Job Overview by Ladders

Qualifications

  • 10+ years of experience in software or platform engineering, focusing on distributed systems and backend services.
  • 3 years of proven engineering management experience, covering team building and performance management.
  • Expertise in observability tools such as Datadog, OpenSearch, and distributed tracing technologies like OpenTelemetry.
  • Experience managing observability platforms in high-availability production environments.
  • Strong track record in leading complex, cross-functional infrastructure projects with significant ambiguity.
  • Proficient communication skills for engaging with both technical and non-technical audiences, including executives.
  • History of managing vendor relationships and third-party integrations in a platform setting.

Responsibilities

  • Lead the development of a unified observability platform serving various engineering teams.
  • Define and evolve the platform’s technical roadmap, ensuring scalability and coherence.
  • Establish platform standards and contribute to architectural direction.
  • Manage team growth by hiring and instilling effective practices across global stakeholders.
  • Drive the design and implementation of a centralized observability infrastructure.
  • Oversee SLO/SLA definition and tooling for reliability across the platform.
  • Collaborate with AI teams to integrate advanced observability capabilities.

Benefits

  • Employer subsidized medical, vision, and dental coverage for full-time employees.
  • 401k match to assist with future savings.
  • Monthly stipend to enhance work productivity.
  • Flexible Time Away Program and sick leave.
  • Life insurance and disability plans sponsored by Smartsheet.
  • 12 paid holidays annually and up to 24 weeks of parental leave.
  • Personal paid Volunteer Day to engage with the community.
  • Professional development opportunities, including access to online courses.
  • Company-funded perks such as counseling memberships and retail discounts.
  • Remote work options available for eligible roles.
Full Job Description
The Observability Platform team is seeking a Senior Manager of Engineering to build and lead a centralized platform capability that gives Smartsheet full-stack visibility into our most complex and consequential systems. This role owns the engineering strategy and execution for a dedicated platform consolidating multiple platforms, serving engineering teams across the company, including the Data & AI Platform, Commerce, and Infrastructure pillars.

You will lead a team based with strategic ownership spanning metrics, distributed tracing, alerting, log analytics, SLO/SLA management, and AI/ML observability integrations tied to SmartAssist and our agentic AI workstreams on Amazon Bedrock and MLflow. This is a high-leverage, high-visibility role at the intersection of platform reliability and AI-native engineering.
You Will:
Team & Platform Leadership
  • Lead a team of engineers focused on observability platform engineering, driving build-out of a unified observability stack used by all engineering teams at Smartsheet.
  • Own and evolve the platform's technical roadmap, consolidating multiple tooling platforms, and AI observability tooling into a coherent, scalable capability.
  • Define platform standards, contribute to architectural direction, and ensure the team operates with engineering rigor and strong operational habits.
  • Build and scale the team, hiring senior engineers and establishing effective global practices across distributed stakeholders.
Observability Engineering
  • Lead design and delivery of centralized observability infrastructure covering metrics pipelines, distributed tracing, alerting frameworks, and log analytics across Smartsheet services.
  • Drive SLO/SLA definition and tooling for platform-wide reliability visibility, partnering closely with infrastructure, platform engineering, and on-call teams.
  • Own governance including instrumentation standards, cost optimization, and rollout of advanced capabilities such as APM, RUM, and custom dashboards.
  • Lead architecture, scaling, and operational practices for log analytics across high-throughput production workloads.
  • Establish shared observability libraries, agents, and SDKs that reduce instrumentation burden for application engineering teams.
AI Observability
  • Build and maintain AI/ML observability integrations in partnership with the AI Platform team.
  • Partner with the Data & AI Platform team to integrate MLflow tracing, Inference Tables, and LLM-as-judge evaluation pipelines into the observability stack.
  • Develop dashboards and alerting for agentic AI workloads, including latency, token consumption, error rates, and evaluation metric drift.
  • Contribute to the AI governance and cost observability program, providing telemetry for model usage, cost attribution, and compliance reporting.
Cross-Functional Partnership & Execution
  • Serve as the primary engineering partner for platform consumers across Data & AI, Commerce, Infrastructure, and Security teams, ensuring observability needs are met across workstreams.
  • Lead complex, cross-functional observability projects with high ambiguity, managing delivery risk, communicating clearly to senior stakeholders, and building alignment across teams.
  • Partner with delivery partners to coordinate instrumentation across platform modernization and migration workstreams
  • Contribute to quarterly and annual platform goals, reporting on key reliability and observability metrics to engineering leadership.
  • Communicate platform status, risks, and roadmap progress to Engineering leadership and above audiences in a clear, executive-ready format.
Operational Excellence
  • Embed on-call culture and incident management discipline into the team, ensuring clear runbooks, fast MTTR, and post-incident learning loops.
  • Drive cost governance for observability tooling, including spend optimization and efficient resource management.
  • Champion AI-assisted engineering practices within the team, applying tooling and automation to reduce toil and accelerate delivery.
You Have:
Required
  • 10+ years of software or platform engineering experience, with strong fundamentals in distributed systems, infrastructure, and backend services.
  • 3 years of engineering management experience, including direct team building, performance management, and cross-functional delivery ownership.
  • Deep hands-on expertise with observability tooling: Datadog (APM, metrics, logs, alerting), OpenSearch or Elasticsearch, distributed tracing (OpenTelemetry or equivalent), and SLO/SLA management at scale.
  • Proven experience operating observability platforms for high-availability, high-throughput production environments.
  • Experience building and scaling engineering teams in distributed or international focus
  • Strong execution track record on complex, cross-functional infrastructure programs with high ambiguity.
  • Clear, direct communication (written and verbal) with both technical and non-technical audiences, including leadership and executive stakeholders.
  • Proactive risk identification and status communication without prompting.
  • Experience managing vendors, external delivery partners, and third-party integrations in a platform context.
Preferred
  • Hands-on experience with AI/ML observability: MLflow tracing, LLM evaluation pipelines, or observability for agentic AI systems.
  • Familiarity with Amazon Bedrock, ECS Fargate, or LangGraph-based multi-agent architectures.
  • Experience with cloud cost governance and FinOps practices for observability tooling
  • Exposure to data platform observability and data quality monitoring in a lakehouse context
  • Experience establishing internal developer platforms, shared libraries, or platform-as-a-service offerings for application teams.
  • Prior work in SaaS environments with enterprise compliance requirements (SOC 2, FedRAMP, HIPAA).
Education & Eligibility
  • CS, Engineering, or equivalent degree, or commensurate practical experience.
  • Legally eligible to work in the U.S. on an ongoing basis


Current US Perks & Benefits:
  • Employer subsidized medical/vision and dental coverage for full-time employees
  • 401k Match to help you save for your future (50% of your contribution up to the first 6% of your eligible pay)
  • Monthly stipend to support your work and productivity
  • Flexible Time Away Program, plus Sick Time Off
  • US employees are automatically covered under Smartsheet-sponsored life insurance, short-term, and long-term disability plans
  • US employees receive 12 paid holidays per year
  • Up to 24 weeks of Parental Leave
  • Personal paid Volunteer Day to support our community
  • Opportunities for professional growth and development including access to Udemy online courses
  • Company Funded Perks, including a counseling membership, local retail discounts, and your own personal Smartsheet account
  • Teleworking options from any registered location in the U.S. (role specific)

Smartsheet provides a competitive base salary range for roles that may be hired in different geographic areas we are licensed to operate our business from. Actual compensation is determined by several factors including, but not limited to, level of professional, educational experience, skills, and specific candidate location. In addition, this role will be eligible for a market competitive incentive opportunity.

US Base Salary Pay Range

$205,000-$275,000 USD

About Smartsheet

Smartsheet is a software as a service (SaaS) company that provides businesses with collaboration and work management tools. The company's platform allows teams to manage and automate workflows, projects, and processes. Smartsheet's software is used by over 90% of the Fortune 100 companies and has over 15 million registered users. The company was founded in 2005 and is headquartered in Bellevue, Washington.
Learn more about Smartsheet
Size
2,539 employees
Market Cap
$4.9 billion
Industry
Net Income
-$114.4 million
Founded
2005
5 Year Trend
+52.4%
Revenue
$354.1 million
NASDAQ

Similar Jobs

More Jobs at Smartsheet

More Information Technology Jobs

Find similar Senior Manager, Engineering - Observability Platform (Remote Eligible) jobs: