AI Infrastructure Engineer

BNY Mellon

$100K — $130K *
Information Technology
8 - 10 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's degree in computer science or related field; advanced degree preferred.
  • 8-10 years of relevant experience; financial services background a plus.
  • Proficiency in Linux systems administration (RHEL/Ubuntu).
  • Experience managing distributed systems using Kubernetes and Docker.
  • Familiarity with NVIDIA GPU management and device lifecycle operations.
  • Knowledge of CI/CD workflows and tools like GitLab CI or Jenkins.
  • Exposure to cloud platforms (AWS, GCP, Azure) and hybrid environments.

Responsibilities

  • Support enterprise-grade NVIDIA AI infrastructure, focusing on GPU compute and high-performance storage.
  • Deploy, monitor, and troubleshoot containerized AI workloads using Kubernetes and Docker.
  • Ensure observability of AI platforms, identifying performance bottlenecks and recommending improvements.
  • Automate infrastructure operations using Python, Bash, Terraform, or Ansible.
  • Scale AI training and inference pipelines, integrating workflows into CI/CD systems.
  • Maintain health and performance of AI systems, proactively addressing reliability.

Benefits

  • Medical, dental, and vision insurance options.
  • 401(k) plan with company match.
  • Generous paid time off policy including holidays and sick leave.
  • Access to professional development and training opportunities.
  • Flexible work environment, potentially with remote options.
Full Job Description
AI Infrastructure Engineer

We9re seeking a future team member for the role of AI Infrastructure Engineer to join our Technology team. This role is located in Lake Mary, FL or Pittsburgh, PA

In this role, you9ll make an impact in the following ways:
  • Be hands-on with enterprise-grade NVIDIA AI infrastructure, supporting GPU-based compute, high-performance storage, and network systems designed for ML/AI at scale.
  • Deploy, monitor, and troubleshoot containerized AI workloads using Kubernetes, Docker, and GPU orchestration tools like Run:AI and NVIDIA BCM.
  • Own the observability of our AI platforms-monitor health, identify performance bottlenecks, and make strategic recommendations to drive platform reliability and maturity.
  • Automate infrastructure operations and provisioning using Python, Bash, and tools like Terraform or Ansible to reduce manual toil and accelerate experimentation.
  • Maintain and scale AI training and inference pipelines, integrating infrastructure workflows into CI/CD systems to enable seamless, automated deployment of AI workloads.
  • Working knowledge of NVIDIA, RunAI Software

To be successful in this role, we9re seeking the following:
  • Bachelor9s degree in computer science or a related discipline, or equivalent work experience required; advanced degree preferred8-10 years of related experience required; experience in the securities or financial services industry is a plus.
  • Experience with Linux administration (RHEL/Ubuntu), shell scripting, and system-level debugging.
  • Proven experience running distributed systems in Kubernetes and containerized environments using Docker.
  • Familiarity with GPU resource management, including NVIDIA GPU Operator and device plugin lifecycle.
  • Experience with CI/CD workflows and infrastructure automation tools such as GitLab CI, Jenkins, Terraform, Helm, or Ansible.
  • Knowledge of networking fundamentals and persistent storage systems.
  • Exposure to cloud platforms (AWS, GCP, Azure) and hybrid GPU environments.
  • Ability to read and support Python code focused on ML/AI pipeline integration.
  • Strong analytical and troubleshooting skills with a collaborative mindset.
  • Effective communication skills and proactive ownership of platform reliability and performance.

Similar Jobs

More Jobs at BNY Mellon

More Information Technology Jobs

Find similar AI Infrastructure Engineer jobs: