Job Description: A Senior Data Architect Engineer to support the DIA O&I Enterprise Integration and Assessments Data Team, focused on accelerating dataset understanding and delivering fit-for-purpose machine learning and data solutions.
The role will help the team move from "what data do we have and what can it support?" to "what model or workflow should we implement, and how do we sustain it?" The emphasis is on selecting the right method or tool for the mission problem, including determining when ML is appropriate and when rules, heuristics, or simpler analytics are better suited
Basic Requirements: - B.S. in related field with 5+ years of experience or 10+ years with no B.S.
- Must be a U.S. Citizen
- Must have an ACTIVE TS/SCI w/ CI Poly Clearance
- Senior-level experience supporting machine learning, data science, data architecture, or data engineering efforts
- Strong Python and SQL experience
- Experience with common machine learning frameworks and libraries, such as scikit-learn, PyTorch, or TensorFlow
- Experience assessing datasets for ML readiness, including data quality, metadata, labeling, ground truth, feature engineering, and constraints
- Experience designing, building, evaluating, and improving ML models based on mission or business use cases
- Experience defining model evaluation metrics and conducting error analysis
- Experience building reproducible technical workflows and documenting implementation approaches
- Experience using AWS SageMaker for experimentation, training, processing, pipelines, model registry, or deployment approaches
- Sound software engineering practices, including Git, readable and modular code, basic testing, documentation, and reproducibility
- Ability to work independently, provide technical recommendations, and interface with Government leads and senior stakeholders with limited oversight
- Familiarity with scalable data processing tools such as Spark or PySpark is a plus
Responsibilities: - Independently assess and explain datasets, including content, structure, quality, gaps, lineage, metadata, constraints, and known limitations
- Identify what is needed to make datasets ML-ready, including labeling, ground truth, feature creation, and data quality considerations
- Recommend practical approaches for addressing risk considerations such as bias, drift, and model limitations
- Design, build, and iterate ML models appropriate to the data and use case, such as classification, entity or record matching, anomaly detection, and natural language processing, as applicable
- Establish baselines and define evaluation metrics tied to operational utility
- Perform error analysis to guide model improvements and inform recommendations
- Build reproducible workflows that can be rerun and sustained within the customer's operating environment and security constraints
- Support implementation decisions, including batch versus real-time processing, resource and cost tradeoffs, and latency or throughput considerations
- Use AWS SageMaker for experimentation and execution, including notebooks or Studio, training jobs, processing jobs, pipelines or automation, model registry, and deployment approaches as permitted
- Provide clear technical recommendations to the team and Government lead on data strategy, modeling choices, architecture patterns, and implementation plans
- Proactively identify technical risks, opportunities, and recommended paths forward with minimal oversigh
ACTIVE TS/SCI with CI POLY CLEARANCE REQUIRED* MUST BE U.S. CITIZENWork Location: ON-SITE SUPPORT DIA HQ in Washington D.C.