MLOps Platform Engineer (SageMaker)

TPI Global (formerly Tech Providers, Inc.)

$120K — $150K *
Plano, TX 75025In-Person
Information Technology
8 - 10 years of experience
Job Overview by Ladders

Qualifications

  • 10-15 years of software engineering experience, especially in cloud or ML platform operations
  • 5+ years of hands-on experience with AWS and deep knowledge of Amazon SageMaker
  • 3+ years of experience building production MLOps pipelines including training and monitoring
  • Familiarity with SageMaker Unified Studio setup and multi-tenant configurations
  • Proficient in Infrastructure-as-Code with Terraform, CDK, or CloudFormation
  • Expertise in IAM design for ML platforms and experience with SSO/SAML integration
  • Experience in data processing using Snowflake as a data source.

Responsibilities

  • Set up and configure the SageMaker Unified Studio platform for project provisioning and role management
  • Build and manage MLOps pipelines for end-to-end ML processes using SageMaker Pipelines
  • Oversee the SageMaker Model Registry for model versioning and lineage tracking
  • Configure MLflow for tracking experiments and auto-logging metrics
  • Establish identity and access management frameworks for secure ML operations
  • Implement model serving mechanisms for real-time predictions and batch transformations
  • Monitor model performance and set up alerts for performance drift.

Benefits

  • Opportunity for contract extension beyond initial term
  • Work in a prominent tech hub (Plano, TX)
  • Hands-on experience with cutting-edge AWS ML services
  • Join a project focused on transforming ML operations in a large organization
  • Engage in a collaborative and dynamic technological environment.
Full Job Description
Job Description

Job Title: MLOps Platform Engineer (SageMaker)
Job Location: Plano, TX
Project Duration: 12 months with possible extension

Job Summary
What we're looking for
Client is looking for a Senior ML Platform Engineer to design, build, and operationalize an enterprise ML platform on AWS SageMaker Unified Studio. You will migrate the organization from a fragmented ML toolchain to a unified, governed platform on AWS Landing Zone 2, covering the full ML lifecycle from data discovery through model deployment and monitoring.

What you'll be doing
  • Set up SageMaker Unified Studio platform -domain configuration, project provisioning, persona-based roles, and multi-environment (Dev, Prod-UAT, Prod) promotion workflows
  • Build MLOps pipelines using SageMaker Pipelines -data extraction from Snowflake, preprocessing, training, evaluation, and model registration
  • Manage SageMaker Model Registry -cross-account model promotion, versioning, immutability, and lineage tracking
  • Configure MLflow experiment tracking -auto-logging of parameters, metrics, and artifacts
  • Set up identity and access management -Okta SSO, SailPoint entitlements, persona-based execution roles, service roles for pipelines
  • Build model serving -real-time SageMaker endpoints and batch prediction workflows
  • Set up model monitoring -data drift, model drift, performance degradation detection
  • Configure data catalog -searchable datasets, access-level visibility, access-request workflows, lineage
  • Own platform operations -observability (CloudWatch, Datadog), logging, custom images, instance availability
Requirements-Qualifications/ What you bring (Must Haves) -Highlight Top 3-5 skills
- 10-15 years of software engineering experience focused on cloud infrastructure or ML platform operations
- 5+ years hands-on with AWS, including deep expertise in Amazon SageMaker (Studio, Pipelines, Model Registry, Endpoints, Feature Store)
- 3+ years building and operating production MLOps pipelines -training, versioning, deployment, monitoring, rollback
- Experience with SageMaker Unified Studio or Studio Classic -domain/project setup, blueprints, multi-tenant configuration
- Unified Studio is preferred to have but Classic is must have.
- Infrastructure-as-Code with Terraform, CDK, or CloudFormation
- IAM design for ML platforms -execution roles, service roles, cross-account access, Lake Formation, SSO/SAML
- MLflow or equivalent experiment tracking
- SageMaker Pipelines or similar workflow orchestration (Airflow, Step Functions)
- Model serving -real-time endpoints, batch transform, auto-scaling, endpoint monitoring
- Snowflake as a data source for ML pipelines
- Kubernetes (EKS) and container orchestration
- Networking and security -VPC, security groups, private endpoints, cross-account connectivity

Added bonus if you have (Preferred):
- SageMaker Unified Studio domain provisioning, custom blueprints, project standardization
- SageMaker Feature Store for online/offline feature management
- SageMaker Model Monitor -data quality checks, bias detection, drift detection
- AWS Machine Learning Specialty certification

Meet Your Recruiter

Peter Jackson

Similar Jobs

More Jobs at TPI Global (formerly Tech Providers, Inc.)

More Information Technology Jobs

Find similar MLOps Platform Engineer (SageMaker) jobs: