TO-695 Senior AI Operations Engineer

Diverse Agile Solutions

$120K — $150K *
Enterprise Technology
8 - 10 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's degree in a relevant field like Computer Science or Engineering
  • 8+ years of IT engineering experience
  • 5+ years supporting cloud infrastructure
  • 4+ years in AI/ML production environments
  • Strong knowledge in MLOps methodologies
  • Experience with CI/CD automation and production Kubernetes management
  • Excellent troubleshooting and analytical skills

Responsibilities

  • Deploy, operate, and support enterprise AI/ML environments
  • Design scalable MLOps pipelines for continuous deployment
  • Automate AI infrastructure using Infrastructure as Code (IaC)
  • Build CI/CD pipelines for machine learning workflows
  • Implement automated model validation and deployment strategies
  • Monitor model health and performance
  • Collaborate with DevSecOps teams to automate security controls

Benefits

  • Comprehensive benefits package
  • 401(k) plan
  • Paid Time Off (PTO) including Federal Holidays
  • Professional development and certification reimbursement
  • Career advancement opportunities
  • Collaborative, innovation-driven workplace culture
Full Job Description
TO-695 - Senior AI Operations Engineer
Diverse Agile Solutions (DAS)

Location: Washington, DC (Hybrid) (or as required by the customer)
Clearance: Ability to obtain and maintain a Public Trust or applicable Federal clearance
Citizenship: U.S. Citizenship Required
Employment Type: Full-Time, W2
Performance Period: Through the end of the year, with the possibility of extension

We are seeking a Senior AI Operations (AIOps) Engineer to lead the deployment, automation, monitoring, governance, and operational excellence of enterprise Artificial Intelligence and Machine Learning platforms supporting mission-critical federal systems.

This position is ideal for someone who combines DevOps, MLOps, Cloud Engineering, Site Reliability Engineering (SRE), and AI platform operations into scalable, secure production environments.
Position Overview

The Senior AI Operations Engineer will design, implement, automate, and support enterprise AI infrastructure and operational workflows. This individual will be responsible for deploying and maintaining production AI services, optimizing model performance, managing infrastructure automation, implementing monitoring solutions, and ensuring compliance with federal security requirements.

The engineer will work closely with Data Scientists, Machine Learning Engineers, DevSecOps teams, Cloud Architects, Cybersecurity Engineers, and software developers to operationalize AI solutions across secure cloud environments.
Responsibilities
  • Deploy, operate, and support enterprise AI/ML production environments
  • Design scalable MLOps pipelines for continuous model deployment
  • Automate AI infrastructure using Infrastructure as Code (IaC)
  • Build CI/CD pipelines supporting machine learning workflows
  • Implement automated model validation and deployment strategies
  • Monitor model health, drift detection, performance, and availability
  • Optimize GPU and compute resource utilization
  • Configure logging, observability, and operational dashboards
  • Manage AI model lifecycle from development through production
  • Support containerized AI workloads using Kubernetes
  • Build automated rollback and disaster recovery capabilities
  • Secure AI infrastructure following Zero Trust principles
  • Implement AI governance and model version management
  • Integrate AI platforms with enterprise applications
  • Maintain operational documentation and runbooks
  • Participate in incident response and root cause analysis
  • Collaborate with DevSecOps teams to automate security controls
  • Optimize cloud costs for AI workloads
  • Ensure compliance with NIST, FedRAMP, and federal security standards
Required Qualifications
  • Bachelor's degree in Computer Science, Engineering, Information Systems, or related field
  • 8+ years of IT engineering experience
  • 5+ years supporting cloud infrastructure
  • 4+ years supporting AI/ML production environments
  • Experience deploying enterprise AI solutions
  • Strong knowledge of MLOps methodologies
  • Experience with CI/CD automation
  • Experience managing production Kubernetes clusters
  • Experience supporting containerized workloads
  • Experience with infrastructure automation
  • Strong Linux administration experience
  • Experience with scripting and automation
  • Excellent troubleshooting and analytical skills
  • Experience working in Agile environments
  • Strong communication and documentation skills
Required Technical Skills
Cloud Platforms
  • AWS
  • Azure
  • Google Cloud Platform (GCP)
AI & Machine Learning
  • MLOps
  • Model deployment
  • Model monitoring
  • Model versioning
  • Model registry
  • Feature stores
  • Prompt management
  • Generative AI operations
  • AI inference optimization
DevOps & Automation
  • GitLab CI/CD
  • GitHub Actions
  • Jenkins
  • Terraform
  • Ansible
  • Helm
  • Docker
  • Kubernetes
  • OpenShift
Programming
  • Python
  • Bash
  • PowerShell
  • SQL
  • REST APIs
AI Frameworks
  • TensorFlow
  • PyTorch
  • Hugging Face Transformers
  • LangChain
  • MLflow
  • Kubeflow
Monitoring & Observability
  • Prometheus
  • Grafana
  • ELK Stack
  • Splunk
  • Datadog
  • CloudWatch
  • Azure Monitor
Data Technologies
  • PostgreSQL
  • MongoDB
  • Redis
  • Kafka
  • Snowflake
  • Vector Databases
Security
  • IAM
  • Secrets Management
  • Encryption
  • NIST 800-53
  • FedRAMP
  • Zero Trust Architecture
Preferred Qualifications
  • Experience supporting Federal Government customers
  • Experience operating AI workloads in AWS GovCloud
  • Experience with Azure AI Foundry
  • Experience with Azure OpenAI
  • Experience with Amazon Bedrock
  • Experience with Vertex AI
  • Experience implementing Responsible AI governance
  • Experience supporting Retrieval Augmented Generation (RAG) systems
  • Experience deploying LLM applications
  • Experience with GPU clusters
  • Experience with NVIDIA AI Enterprise
  • Experience with ServiceNow integrations
Preferred Certifications

One or more of the following:
  • AWS Certified DevOps Engineer
  • AWS Certified Machine Learning Engineer
  • Microsoft Azure AI Engineer Associate
  • Microsoft Azure Administrator
  • Kubernetes Administrator (CKA)
  • HashiCorp Terraform Associate
  • Certified Kubernetes Security Specialist (CKS)
  • Google Professional Machine Learning Engineer
  • Security+
  • CISSP
What You'll Do
  • Operationalize enterprise AI platforms
  • Improve reliability of production AI services
  • Build automated AI deployment pipelines
  • Reduce operational overhead through automation
  • Improve model performance and reliability
  • Enhance observability of AI systems
  • Implement secure AI operations
  • Enable scalable AI infrastructure across multiple cloud environments
What We Offer
  • Competitive salary
  • Comprehensive benefits package
  • 401(k)
  • Paid Time Off (PTO)
  • Paid Federal Holidays
  • Professional development and certification reimbursement
  • Career advancement opportunities
  • Collaborative, innovation-driven culture
BreezyHR Keywords (ATS Optimization)

AI Operations, AIOps, MLOps, Machine Learning Operations, Artificial Intelligence, Large Language Models, LLM, Generative AI, GenAI, Azure AI Foundry, Azure OpenAI, Amazon Bedrock, Vertex AI, Kubernetes, Docker, Terraform, AWS, Azure, GCP, GitLab CI/CD, Jenkins, MLflow, Kubeflow, LangChain, Hugging Face, TensorFlow, PyTorch, Python, Infrastructure as Code, DevSecOps, Site Reliability Engineering, AI Governance, RAG, Prompt Engineering, Model Monitoring, Model Deployment, AI Platform Engineer, Federal Government, GovCloud, Zero Trust, FedRAMP, NIST 800-53.

Similar Jobs

More Jobs at Diverse Agile Solutions

More Enterprise Technology Jobs

Find similar TO-695 Senior AI Operations Engineer jobs: