AI Evaluation Subject Matter Expert

Foxhole Technology

$100K — $130K *
Aerospace & Defense
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's or advanced degree in computer science, AI, engineering, or related field preferred.
  • Experience in evaluating AI and machine learning systems, especially in defense or cybersecurity.
  • In-depth knowledge of AI/ML evaluation methods and performance metrics.
  • Ability to assess AI in operationally constrained environments like limited bandwidth and degraded connectivity.
  • Strong collaboration skills with cross-functional teams in engineering and cybersecurity.

Responsibilities

  • Serve as a senior advisor for AI evaluation and operational suitability analysis.
  • Develop comprehensive AI evaluation strategies and test plans.
  • Assess the operational relevance of AI capabilities and their technical maturity.
  • Evaluate AI tools for various network operations and anomaly detection use cases.
  • Design test scenarios that mimic Navy operating conditions.
  • Define and manage data requirements, validation methods, and performance baselines.
  • Produce detailed technical reports and recommendations for leadership.

Benefits

  • Hybrid work arrangement allowing flexible in-office and remote work.
  • Opportunity to work on advanced defense technologies within the Navy.
  • Engage in cross-disciplinary collaboration with some of the brightest minds in the field.
  • Access to professional development and continuous learning opportunities.
  • Support for obtaining additional certifications or advanced degrees.
Full Job Description
Work Arrangement: Hybrid

Clearance: Active Secret w/TS Capability

Foxhole Technology is seeking an AI Evaluation SME to join an existing program. The AI Evaluation SME will support the assessment, testing, validation, and operational evaluation of artificial intelligence, machine learning, automation, analytics, and decision-support capabilities being considered for or integrated into the Navy's Next Generation CANES environment. This role will help ensure AI-enabled capabilities are mission-relevant, reliable, secure, explainable, measurable, and suitable for deployment within afloat, tactical, disconnected, intermittent, limited-bandwidth, and multi-security-domain environments.

The SME will develop evaluation frameworks, test methods, metrics, datasets, scenarios, risk assessments, and reporting products that help Navy and CACI stakeholders determine whether AI-enabled capabilities improve network operations, cyber defense, system administration, predictive maintenance, anomaly detection, configuration management, mission planning, or other CANES-related functions.

KEY RESPONSIBILITIES:
  • Serve as a senior technical advisor for AI evaluation, test planning, performance assessment, and operational suitability analysis in support of Next Generation CANES modernization.
  • Develop AI evaluation strategies, test plans, measures of effectiveness, measures of performance, success criteria, risk indicators, and evaluation scorecards.
  • Assess AI, machine learning, generative AI, automation, analytics, and decision-support capabilities for operational relevance, technical maturity, cyber risk, reliability, maintainability, explainability, human oversight, and fleet suitability.
  • Evaluate AI-enabled tools for use cases such as network monitoring, cyber anomaly detection, event correlation, predictive maintenance, help desk automation, configuration compliance, system health monitoring, log analysis, vulnerability prioritization, and operational decision support.
  • Design test scenarios that reflect Navy afloat operating conditions, including limited bandwidth, disconnected operations, contested cyber environments, cross-domain constraints, variable data quality, and platform-specific operational limitations.
  • Define data requirements, ground truth methods, evaluation datasets, labeling approaches, validation methods, and performance baselines for AI-enabled capabilities.
  • Assess AI model performance using appropriate metrics such as accuracy, precision, recall, false positive rate, false negative rate, latency, robustness, drift, confidence calibration, explainability, and operational impact.
  • Evaluate risks associated with hallucination, model brittleness, adversarial manipulation, data poisoning, prompt injection, bias, over-reliance, model drift, cybersecurity exposure, and failure modes in operational environments.
  • Support AI red teaming, cyber survivability assessment, adversarial testing, safety reviews, and responsible AI evaluation activities.
  • Develop human-machine teaming concepts, operator-in-the-loop workflows, trust calibration approaches, escalation procedures, and recommended guardrails for AI-enabled tools.
  • Produce technical reports, evaluation findings, executive summaries, test observations, data analysis products, and recommendations for Navy and CACI leadership.
  • Collaborate with systems engineers, cybersecurity engineers, software developers, data scientists, network engineers, operational testers, fleet users, and government stakeholders.
  • Support technical interchange meetings, design reviews, test readiness reviews, operational assessments, demonstrations, and acquisition decision support.
  • Provide SME input on AI governance, responsible AI implementation, model lifecycle management, configuration control, sustainment, monitoring, and continuous evaluation.

REQUIRED QUALIFICATIONS:
  • Bachelor's degree in computer science, data science, artificial intelligence, engineering, mathematics, statistics, cybersecurity, operations research, information systems, or a related technical discipline preferred. Advanced degree preferred.
  • Additional years of directly relevant AI evaluation, test, cybersecurity, Navy, or DoD mission system experience may be considered in lieu of a degree.
  • Demonstrated experience evaluating AI, machine learning, data analytics, automation, or decision-support systems in defense, intelligence, cybersecurity, network operations, enterprise IT, or mission system environments.
  • Strong understanding of AI / ML evaluation methods, test design, performance metrics, validation approaches, model limitations, and operational risk assessment.
  • Experience developing test plans, evaluation frameworks, measures of effectiveness, measures of performance, data collection plans, and technical reports.
  • Familiarity with cybersecurity, enterprise networks, tactical networks, system monitoring, anomaly detection, log analytics, or network operations use cases.
  • Ability to assess AI-enabled systems in operationally constrained environments, including limited bandwidth, degraded connectivity, edge computing, and mission-critical infrastructure.
  • Understanding of responsible AI concepts, including transparency, explainability, human oversight, robustness, security, bias, accountability, and lifecycle monitoring.
  • Experience working with cross-functional engineering, cyber, data science, software, test, and government stakeholder teams.
  • Strong written and verbal communication skills, including the ability to brief complex AI evaluation findings to technical and non-technical audiences.
  • Active DoD Secret clearance.

DESIRED QUALIFICATIONS:
  • Experience supporting Navy, DoD, tactical edge, afloat, C4I, cyber, enterprise IT, or mission command systems.
  • Familiarity with CANES, Navy afloat networks, ADNS, NAVWAR programs, RMF, cyber survivability testing, operational test, developmental test, or fleet experimentation.
  • Experience evaluating generative AI, large language models, retrieval-augmented generation, autonomous agents, AI-assisted cyber tools, AI-enabled network operations, or predictive analytics systems.
  • Knowledge of DoD responsible AI guidance, NIST AI Risk Management Framework concepts, RMF, Zero Trust, DevSecOps, MLOps, model monitoring, or secure software supply chain practices.
  • Experience with data analysis tools, scripting, statistical evaluation, dashboards, test automation, or model performance analysis.
  • Experience with AI red teaming, adversarial ML, cyber test events, operational assessments, or acquisition decision support.
  • Top Secret clearance or SCI eligibility.


Requirements of position: Think analytically, effective verbal and written communication skills, make decisions, observe/remember details, interpret data, concentrate on tasks, adjust to change, handle stress/emotions. Regular attendance, maintain work schedule, attend meetings, meet deadlines, keyboard/type, handle confidential information, use math/calculations, stay organized, operate office equipment, may direct others. May be exposed to dust/dirt, humidity, and noise

Similar Jobs

More Jobs at Foxhole Technology

More Aerospace & Defense Jobs

Find similar AI Evaluation Subject Matter Expert jobs: