Role OverviewTitle: Platform Engineer, AI Agent Systems
Hours: Full-time; Salaried
Location: Salt Lake City, UT (hybrid in-office)
Benefits Eligible: Yes
Manager: Justin Hanselman
Zanskar uses proprietary geophysical models and subsurface data to find geothermal resources faster than anyone else. We9re building the infrastructure that lets our team put AI agents to work on that data - safely. We need a Platform Engineer to own that infrastructure: the systems that let agents operate on sensitive company data and models without creating pathways for that data to be deleted, altered, or exfiltrated. The work isn9t theoretical - we have proprietary subsurface models and geophysical datasets that cannot leave our environment, and we need agents that can work with them in production today.
Outcomes - Problems you9ll solveSuccess in this role centers on two tightly related workstreams that share the same hard underlying problem: letting agents operate on sensitive company data and models safely. You9ll design and operate the production systems that run agentic workflows inside our infrastructure as well as support a self-service layer for our internal teams to deploy agent-based tools and agent-built internal apps.
You9ll own the access control architecture that governs what agents can read, write, and call in both production and sandboxed experiments - explicit trust boundaries, revocable credentials, and audit trails that hold up under scrutiny. Throughout, you9ll partner directly with scientists, engineers, finance, legal, and comms to understand what they need, what they9ll accidentally break, and how to make the on-ramp fast without compromising the guardrails.
Within six months, you9ll have foundational agent-deployment infrastructure running in production, a sandbox environment that internal teams are actively using to build tools, and an access-control and observability model that lets Zanskar expand agent usage safely as the company grows.
Competencies - What we9re looking for- A Platform Engineer First: You have 3+ years building and operating internal APIs, platform services, or backend infrastructure. You reach for containerized environments naturally, think in terms of service-to-service auth and secrets management, and you know what observability actually requires - not just that it9s important.
- Fluent in How Agents Fail: You have worked with LLM tool-use or function-calling in a production or near-production context. You understand hallucinated tool calls, unbounded loops, and context bleed - not just as concepts, but as things you9ve had to diagnose and contain. You have hands-on experience with at least one agentic framework and its operational tradeoffs.
- Security-Minded by Default: You don9t treat access controls as a bolt-on. You design systems where the path of least resistance is also the safe path. You9ve thought about what happens when a component is compromised, not just when it works correctly.
- Comfortable With Unsolved Problems: There is no playbook for safely deploying proprietary AI agents at the speed a small technical team needs. You9re the kind of engineer who figures out the right approach rather than waiting for one to arrive. You do your best work when the problem is real and the constraints are hard.
Nice to have:- Experience with multi-tenant systems where isolation between environments is a hard requirement, not a best-effort.
- Familiarity with model serving infrastructure and the patterns that apply when the model itself is proprietary.
- Experience with data access patterns in scientific or geospatial contexts - raster/vector data, large file stores, or similar environments where data sensitivity and file size create real engineering constraints.
- Experience managing LLM provider integrations at the API layer - cost observability, rate limiting, failover logic, or gateway tooling. Single-provider depth is fine; understanding when and why you9d need to abstract to multi-model is what matters.
- Familiarity with or interest in frontend development - the systems you9re building will help users deploy their apps to share across the company.
- Experience in Python, TypeScript, Golang, Pulumi, or Kubernetes - you9ll be involved with infrastructure tooling that uses aspects of each while setting up frameworks for the company.
Location, Salary, and Benefits- The position is located in Salt Lake City, UT
- Full-time; salaried
- Paid holidays
- 18 days PTO + PTO accrual increase based on tenure
- Medical, Dental & Vision coverage
- Equity Packages
- 401k
- Paid Parental Leave
- A direct impact in displacing carbon emissions, and growth opportunities in a growing startup environment