ECS

Monitoring & Telemetry Lead SME

ECS$120K — $150K *
Aerospace & Defense
11 - 15 years of experience
Job Overview by Ladders

Qualifications

  • Current Secret security clearance with ability to obtain Top Secret (TS) clearance and Sensitive Compartmented Information (SCI) status.
  • CompTIA A+ certification.
  • Minimum 12 years of experience in enterprise monitoring, telemetry, and observability frameworks design and implementation.
  • Hands-on expertise with Prometheus, Grafana, OpenTelemetry, Elastic, and Splunk monitoring platforms.
  • Experience integrating observability frameworks with DevSecOps pipelines and AI/ML infrastructure.
  • Strong problem-solving and decision-making skills.
  • Excellent interpersonal and communication abilities.

Responsibilities

  • Define and govern telemetry and observability frameworks for AI/ML operations across multiple classification environments.
  • Establish monitoring standards and health signal baselines for API and service integrations.
  • Design instrumentation patterns for deployment pipelines and service meshes to enhance operational visibility.
  • Implement metrics pipelines to capture critical performance data using various monitoring tools.
  • Conduct telemetry readiness reviews and validate monitoring completeness for serving endpoints.
  • Coordinate with engineering teams to integrate observability within reliability assessments and mission workflows.
  • Produce operational dashboards and assessment reports to improve incident readiness and system reliability.

Benefits

  • Opportunity to work in a key initiative within the U.S. Department of War's AI-First strategy.
  • Collaborate with high-level defense and military personnel.
  • Engage in advanced monitoring technologies and frameworks.
  • Impact critical decision-making processes for national defense.
  • Continual professional development and skill advancement opportunities.
Full Job Description
Everforth ECS is seeking a Monitoring & Telemetry Lead SME to work in the National Capital Region covering the Pentagon, Falls Church, and Fairfax. Please Note: This position is contingent upon contract award.

The War Data Platform (WDP) is a key initiative within the U.S. Department of War's (DoW) AI-First strategy introduced in early 2026. The WDP focuses on operational warfighting data and aims to accelerate the deployment of artificial intelligence (AI) on the battlefield. The WDP extends to Unclassified, Secret, and Top Secret environments, and supports collaboration between Combatant Commands, Joint Staff directorates, Senior Executive Service leaders, and operational analysts.
• This role defines, architects, and governs telemetry, observability, and service-level indicator frameworks supporting AI and machine learning model-serving operations across all WDP classification enclaves, ensuring enterprise-wide operational visibility, mission assurance alignment, and resilient monitoring of AI/ML-serving infrastructure and API ecosystems.
• Defines, architects, and governs telemetry, observability, and service-level indicator frameworks supporting AI and machine learning model-serving operations across Unclassified, Secret, and Top Secret enclaves within the War Data Platform (WDP) Core Integration enterprise.
• Establishes monitoring standards, performance indicators, traceability conventions, and health signal baselines for serving application programming interfaces, reverse proxies, model zoo interfaces, and external provider integrations supporting Combatant Commands, Joint Staff elements, and Senior Executive Service decision makers.
• Designs and integrates instrumentation patterns within deployment pipelines, runtime environments, service meshes, and logging and auditing frameworks to provide immediate operational visibility following production releases.
• Implements structured metrics pipelines using platforms such as Prometheus, Grafana, OpenTelemetry, Elastic, Splunk, and DoW-approved monitoring suites to capture latency, throughput, error rates, dependency bottlenecks, cross-domain access behavior, and cyber-relevant anomalies.
• Conducts telemetry readiness reviews, evaluates instrumentation completeness, and validates monitoring coverage for emerging serving endpoints, model artifacts, and external model provider interfaces.
• Coordinates with model-serving engineers, API engineers, DevSecOps teams, pipeline operators, cybersecurity personnel, and platform architects to integrate observability with test and evaluation gates, reliability assessments, and mission assurance workflows.
• Produces operational dashboards, service-level definitions, instrumentation standards, alerting policies, runbooks, and observability assessment reports that strengthen reliability, accelerate incident triage, and elevate mission readiness across all enclaves.
• Advances War Data Platform (WDP) Core Integration program value by delivering resilient, measurable, and domain-compliant monitoring capabilities for enterprise AI/ML model access.
• Performs other duties as assigned.
• Current Secret security clearance with the ability to obtain and maintain a Top Secret (TS) security clearance with Sensitive Compartmented Information (SCI).
• CompTIA A+ certification.
• Minimum 12 years of experience designing, implementing, and governing enterprise monitoring, telemetry, and observability frameworks across multi-domain or classified environments.
• Demonstrated hands-on expertise with monitoring and observability platforms, including Prometheus, Grafana, OpenTelemetry, Elastic, and Splunk, with proven ability to architect structured metrics pipelines and operational dashboards in production environments.
• Experience integrating observability frameworks with DevSecOps pipelines, service meshes, and AI/ML model-serving infrastructure to ensure real-time operational visibility and mission-assurance alignment.
• Strong problem-solving and decision-making capabilities, with a proven ability to weigh the relative costs and benefits of potential actions and identify the most appropriate solution.
• Highly developed interpersonal and oral/written communication skills, with the ability to effectively and professionally interact with a diverse set of stakeholders (from peers to end-users to executive management).

About ECS

ECS is a leading provider of digital solutions and services to the federal government. The company was founded in 2001 by Roy Kapani and has since grown to become a trusted partner to a wide range of government agencies. ECS offers a broad range of services, including cloud computing, cybersecurity, and artificial intelligence. The company has been recognized for its innovative solutions and has won numerous awards, including the AWS Public Sector Partner of the Year award.
Learn more about ECS
Size
2,000 employees
Industry

Similar Jobs

More Jobs at ECS

  • ECS
    Analytic Engineer
    $90K — $130K *
    Fairfax, VA 22031 (Fairfax County)
    Aerospace & Defense
    In-Person
  • ECS
    Analytic Engineer
    $90K — $130K *
    Falls Church, VA 22042 (Fairfax County)
    Aerospace & Defense
    In-Person
  • ECS
    Senior Governance Training Specialist
    $100K — $130K *
    Fairfax, VA 22031 (Fairfax County)
    Education, Government & Non-Profit
    In-Person
  • ECS
    Analytic Engineer
    $90K — $130K *
    Washington, DC 20310 (District Of Columbia County)
    Aerospace & Defense
    In-Person
  • ECS
    Senior Governance Training Specialist
    $100K — $130K *
    Falls Church, VA 22042 (Fairfax County)
    Education, Government & Non-Profit
    In-Person

More Aerospace & Defense Jobs

Find similar Monitoring & Telemetry Lead SME jobs: