ECS

Senior Tier-4 Model Serving Support Lead

ECS$120K — $150K *
Aerospace & Defense
8 - 10 years of experience
Job Overview by Ladders

Qualifications

  • 10+ years of experience in AI/ML platform operations or senior IT support roles
  • Hands-on expertise with Kubernetes, GitLab CI, and Elastic Stack
  • Experience ensuring operational readiness in classified multi-enclave cloud environments
  • Strong problem-solving and decision-making skills
  • CompTIA A+ certification or equivalent
  • Current Secret security clearance with the ability to obtain a Top Secret (TS) clearance

Responsibilities

  • Oversee Tier-4 escalation coordination for AI model-serving pipelines
  • Direct incident response workflows and validate operational impact
  • Utilize observability tools to diagnose and stabilize serving failures
  • Coordinate with engineering teams to ensure operational readiness
  • Conduct post-incident analyses and document remediation steps
  • Produce mission-critical deliverables for escalation and operational risk
  • Support enterprise release operations and maintain Tier-4 support artifacts

Benefits

  • Collaborative environment with cross-service mission partners
  • Opportunity to work on critical national defense projects
  • Engagement with cutting-edge AI/ML technologies
  • Comprehensive professional development opportunities
  • Support for obtaining and maintaining advanced security clearances
Full Job Description
Everforth ECS is seeking a Senior Tier-4 Model Serving Support Lead to work in the National Capital Region covering the Pentagon, Falls Church, and Fairfax. Please Note: This position is contingent upon contract award.

The Senior Tier-4 Model Serving Support Lead serves as the authoritative escalation owner for AI and machine learning model-serving pipelines, production endpoints, and model zoo operations across WDP Core Integration's full multi-enclave environment. This role bridges platform engineering, cybersecurity, and cross-service mission partners to sustain uninterrupted AI model-serving performance in direct support of DoW missions, Joint Staff analysts, Combatant Command elements, and Senior Executive Service leadership.
• Owns Tier-4 escalation coordination for artificial intelligence and machine learning model-serving pipelines, production endpoints, and model zoo operations within War Data Platform (WDP) Core Integration environments supporting Department of War missions, Joint Staff analysts, Combatant Command elements, and Senior Executive Service leadership.
• Directs escalation workflows by activating incident bridges, coordinating engineering response actions, validating operational impact, and aligning escalation playbooks with service-level agreement requirements.
• Applies Kubernetes, GitLab Continuous Integration, VMware environments, Elastic Stack, Prometheus metrics, Grafana dashboards, and enterprise observability tooling to diagnose serving failures, analyze telemetry, and guide stabilization activities across unclassified and higher-domain enclaves.
• Leads coordination with Platform One, Cloud One, multi-national engineering teams, and cross-service mission partners to maintain operational readiness for serving pipelines, cross-domain transfer workflows, API endpoints, and model-runtime components.
• Conducts structured post-incident analysis by collecting operational evidence, reconstructing failure sequences, validating remediation steps, and documenting mission-assurance considerations for future release cycles.
• Produces mission-critical deliverables including escalation playbooks, incident-response documentation, service-level alignment reports, operational risk assessments, and restoration summaries.
• Strengthens program value by reinforcing deployment consistency, advancing mission assurance posture, and sustaining operational continuity across all enclaves.
• Supports enterprise release operations by coordinating readiness checks, validating rollback pathways, and maintaining authoritative Tier-4 support artifacts required for uninterrupted artificial intelligence model-serving performance.
• Performs other duties as assigned.
• Current Secret security clearance with the ability to obtain and maintain a Top Secret (TS) security clearance with Sensitive Compartmented Information (SCI).
• 10 or more years of progressive experience in AI/ML platform operations, enterprise incident management, or senior IT support roles, with demonstrated responsibility for Tier-4 or equivalent escalation ownership in classified or federal government multi-enclave cloud environments.
• Hands-on experience applying enterprise observability and container orchestration tooling, including Kubernetes, GitLab CI, Elastic Stack, Prometheus, and Grafana, to diagnose AI/ML serving failures, analyze pipeline telemetry, and coordinate stabilization activities across Unclassified, Secret, and Top Secret network environments.
• Demonstrated experience coordinating with DoW-authorized DevSecOps platform environments such as Platform One or Cloud One, including participation in cross-enclave release readiness activities, rollback validation, and post-deployment stability verification for AI/ML model-serving workloads.
• CompTIA A+ certification or equivalent, demonstrating validated foundational knowledge of IT systems, hardware, software, and operational support practices.
• Strong problem-solving and decision-making capabilities, with a proven ability to weigh the relative costs and benefits of potential actions and identify the most appropriate solution.
• Highly developed interpersonal and oral/written communication skills, with the ability to effectively and professionally interact with a diverse set of stakeholders (from peers to end-users to executive management).

About ECS

ECS is a leading provider of digital solutions and services to the federal government. The company was founded in 2001 by Roy Kapani and has since grown to become a trusted partner to a wide range of government agencies. ECS offers a broad range of services, including cloud computing, cybersecurity, and artificial intelligence. The company has been recognized for its innovative solutions and has won numerous awards, including the AWS Public Sector Partner of the Year award.
Learn more about ECS
Size
2,000 employees
Industry

Similar Jobs

More Jobs at ECS

More Aerospace & Defense Jobs

Find similar Senior Tier-4 Model Serving Support Lead jobs: