Make Your Mark:We're looking for a Senior AI/ML Engineer to design, build, and optimize data pipelines that power our next-generation AI-driven accounting agents. You'll lead the development of scalable, high-performance data infrastructure while collaborating closely across teams.
Responsibilities:
- Lead data pipeline development: Build and maintain PySpark ETL pipelines with high data quality and performance
- Manage integrations: Establish robust connections to client data sources via APIs and tools like FiveTran, Plaid, and BlackLine's own internal connector ecosystem
- Ensure reliability: Monitor pipeline performance, automate testing, and validate data accuracy
- Optimize for scale: Implement performance improvements (e.g., CDC mechanisms, indexing strategies) for large-scale datasets
- Collaborate & innovate: Work with business stakeholders to refine data requirements and integrate cutting-edge AI and big data technologies
You'll Get To: Leadership and Strategy - Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs).
- Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments.
- Lead incident response and reliability strategies for ML/AI systems.
AI System Deployment and Integration: - Collaborate with development teams to integrate AI solutions into existing workflows and applications.
- Ensure seamless integration with different platforms and technologies.
- Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance.
- Build CI/CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows.
- Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics.
- Implement logging, metering, and auditing for agent behavior, function calls, and compliance alignment.
- Create scalable observability systems-tracking conversation outcomes, factual accuracy, latency, escalation patterns, and safety events.
- Architect end-to-end guardrails for AI agents including prompt injection protection, identity-aware routing, and tool usage authorization.
- Collaborate cross-functionally to standardize authentication, authorization, and session governance for multi-agent runtimes.
Model Deployment and Integration:
- Architect and standardize model registries and feature stores to support version tracking, lineage, and reproducibility across environments.
- Lead the deployment of machine learning models into production environments, ensuring scalability, reliability, and efficiency.
- Collaborate with software engineers to integrate machine learning models into existing applications and systems.
- Implement and maintain APIs for model inference.
Infrastructure and Environment Management:
- Design and manage training infrastructure including distributed training orchestration, GPU/TPU resource allocation, and automatic scaling.
- Implement CI/CD for model workflows using pipelines integrated with model validation, bias checks, and rollback automation.
- Build standardized experimentation frameworks for reproducible training, tuning, and deployment cycles (MLflow, W&B, Kubeflow).
- Manage and optimize the infrastructure required for machine learning operations in cloud.
- Work closely with other teams to ensure the availability, security, and performance of machine learning systems.
Monitoring and Maintenance: - Implement robust monitoring solutions for deployed machine learning models to detect issues and ensure performance.
- Collaborate with data scientists and engineers to address and resolve model performance and data quality issues.
- Conduct regular system maintenance, updates, and optimizations to ensure optimal performance of machine learning solutions.
Automation and Orchestration: - Develop and maintain automation scripts and tools for managing machine learning workflows.
- Implement orchestration systems to streamline the end-to-end machine learning lifecycle, from data preparation to model deployment.
Collaboration with Data Science Teams: - Collaborate with data scientists to understand model requirements and constraints for deployment.
- Facilitate the transition of machine learning models from research to production, ensuring scalability and efficiency.
Performance Optimization: - Identify and implement optimizations to enhance the performance and efficiency of machine learning models in production.
- Conduct performance analysis and implement improvements based on resource utilization of metrics.
Security and Compliance: - Implement security measures to protect machine learning systems and data.
- Ensure compliance with regulatory requirements and industry standards related to machine learning and data privacy.
- Integrate audit controls, metadata storage, and lineage tracking across ML and AI workflows.
- Ensure complete monitoring and feedback loops including event logs, evaluations, and automated retraining triggers.
- Enforce secure deployment patterns with Infrastructure-as-Code and cloud-native secrets management.
- Define SLAs, error budgets, and compliance reporting mechanisms for ML and AI systems.
What You'll Bring: - 3+ years of experience with programming skills in languages such as Python, Java, or Scala.
- Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow).
- Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure).
- Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management.
- Strong competencies in CI/CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation.
- Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking.
- Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads.
- Proficiency in containerization technologies (e.g., Docker, Kubernetes).
We're Even More Excited If You Have: Operations and Infrastructure: - Proficient in scripting languages (e.g., Bash, python) for automation.
- Experience with workflow orchestration tools (e.g., Apache Airflow).
- Expertise in managing and optimizing cloud-based infrastructure.
- Familiarity with DevOps practices and tools for automated deployment.
- Understanding of network configurations and security protocols.
Problem-solving and Critical Thinking:
- Ability to define problems, collect and analyze data, and propose innovative solutions. Strong critical thinking skills to evaluate models, identify limitations, and
Adaptability and Learning Agility:
- Comfortable working in a fast-paced, rapidly evolving environment. Proactive in staying up to date with the latest trends, techniques, and technologies in AI/data science
Salary Range: USD $145,000.00/Yr. - USD $182,000.00/Yr.
Pay Transparency Statement: Placement within this range depends upon several factors, including the applicant's prior relevant job experience, skill set, and geographic location. In addition to base pay, BlackLine also offers short-term and long-term incentive programs, based on eligibility, along with a robust offering of benefit and wellness plans.