Qualifications
Responsibilities
Benefits
About the Team
The AI Model Serving team is the engine behind every production Workday agent and machine learning use case. We own the services that power all production AI workloads, serving as the gateway to vendor-hosted LLMs on GCP and AWS Bedrock and operating the model deployment platform where Workday hosts and scales its models.About the Role
As a Principal Software Development Engineer on the AI Model Serving team, you will be a technical leader who helps shape the vision and direction of the platform alongside the engineering manager. You will play a central role in making critical design decisions, driving outcomes across the team, and setting a positive and inclusive team culture. In addition to the model serving platform, the team also owns the production model registry at Workday, and you will help guide its evolution and ensure it meets the needs of ML teams across the organization.
Your work will directly impact Workday's ability to serve AI at scale — from traditional ML models to the latest large language models powering Workday's agents.
Key Responsibilities:
Help set the product vision for the AI Model Serving platform in partnership with the engineering manager, bringing a product-oriented mindset to infrastructure decisions.
Lead the team technically by making critical design decisions that drive performance, reliability, and scalability across the platform.
Design, implement, and maintain large-scale systems that enable moving ML models to production.
Write design documents to build consensus for new system components and enhancements to existing components.
Evaluate and uptake new technologies made available within Workday and across the broader industry.
Troubleshoot, improve, and scale continuous integration software pipelines.
Develop relationships with software engineers, machine learning engineers, and data scientists on partner teams.
Respond to alerts and debug production issues to maintain platform health and reliability.
Review pull requests and enforce consistency, performance, readability, and security across code bases.
Develop documentation to share knowledge with other engineers.
About You
Basic Qualifications
8+ years of related work experience in software development, with a focus on building and operating large-scale distributed systems.
Bachelor's degree in Computer Science, Engineering, or a related technical field (or equivalent practical experience).
Other Qualifications
Software Development and Distributed Systems: Deep experience designing, building, and scaling production-grade distributed systems. You understand the full software development lifecycle — from coding standards and testing to code reviews, source control management, and deployment — and you can apply that knowledge to complex, high-throughput platforms.
Product Thinking and Design: You bring a product-oriented perspective to platform engineering. You can identify what matters most for internal and external users of the platform, translate those insights into technical direction, and make design decisions that balance usability, performance, and long-term maintainability.
Python: Deep proficiency in Python, with extensive experience writing production-level code and building systems in Python-based frameworks.
LLMs and Traditional ML Models: Familiarity with both large language models and traditional machine learning models, including how they are served, scaled, and monitored in production environments. You understand the operational differences and can design platform abstractions that serve both effectively.
Ray Serve: Experience with Ray and Ray Serve for distributed model serving at scale. You understand how to operate, tune, and scale Ray Serve clusters in production.
Prometheus: Experience with Prometheus for monitoring and observability of distributed systems. You can design and maintain monitoring strategies that provide clear insight into system health, performance, and cost.
Excellent written and verbal communication skills, including the ability to write clear design documents, articulate complex technical ideas, and build consensus across teams.
A collaborative approach to engineering, with experience mentoring other engineers and fostering an inclusive team environment.
Workday Pay Transparency Statement
The annualized base salary ranges for the primary location and any additional locations are listed below. Workday pay ranges vary based on work location. As a part of the total compensation package, this role may be eligible for the Workday Bonus Plan or a role-specific commission/bonus, as well as annual refresh stock grants. Recruiters can share more detail during the hiring process. Each candidate’s compensation offer will be based on multiple factors including, but not limited to, geography, experience, skills, job duties, and business need, among other things. For more information regarding Workday’s comprehensive benefits, please .
Primary Location: CAN.ON.TorontoPrimary CAN Base Pay Range: $168,000 - $252,000 CADAdditional CAN Location(s) Base Pay Range: $168,000 - $252,000 CAD
Our Approach to Flexible Work
With Flex Work, we’re combining the best of both worlds: in-person time and remote. Our approach enables our teams to deepen connections, maintain a strong community, and do their best work. We know that flexibility can take shape in many ways, so rather than a number of required days in-office each week, we simply spend at least half (50%) of our time each quarter in the office or in the field with our customers, prospects, and partners (depending on role). This means you’ll have the freedom to create a flexible schedule that caters to your business, team, and personal needs, while being intentional to make the most of time spent together. Those in our remote "home office" roles also have the opportunity to come together in our offices for important moments that matter.
About Workday
Similar Jobs



More Jobs at Workday





More Enterprise Technology Jobs
