Quantiphi

Architect - Platform Engineer

Quantiphi$120K — $160K *
US-AnywhereRemote in United States
Enterprise Technology
8 - 10 years of experience
Job Overview by Ladders

Qualifications

  • 10+ years of experience in technology roles, with a focus on infrastructure for AI workloads
  • Hands-on expertise with Slurm and distributed training environments
  • Deep knowledge of the NVIDIA GPU ecosystem, including CUDA and cuDNN
  • Strong foundation in Linux systems, performance tuning, and multi-GPU optimization
  • Experience deploying GenAI workloads, particularly LLM fine-tuning and RAG pipelines
  • Familiarity with Infrastructure-as-Code tools like Terraform and Ansible
  • Experience with cloud GPU environments such as GCP, Azure, and AWS.

Responsibilities

  • Design and implement scalable infrastructure for LLM and GenAI workloads in multi-GPU settings
  • Optimize GPU performance for distributed training tasks
  • Manage compute-intensive jobs using Slurm on OpenShift/Kubernetes
  • Collaborate with teams to deploy and support models in production
  • Develop reusable infrastructure templates with tools like Terraform and Helm
  • Drive the adoption of Infrastructure as Code (IaC) practices
  • Automate development processes through CI/CD pipelines.

Benefits

  • Impact at a fast-growing AI-first digital engineering company
  • Opportunities to upskill and tackle complex challenges with skilled colleagues
  • Work with innovative teams in a research-focused environment, over 60 patents filed
  • Gain exposure to cutting-edge AI, ML, data, and cloud technologies in Fortune 500 settings.
Full Job Description
Role:Architect - Platform Engineer

Experience Level:10+ yrs

Work Location:US East/Canada (Remote)

Role Overview:

We are looking for a highly skilled Architect - Platform Engineer to design, optimize, and scale infrastructure for GenAI and LLM workloads. This role is ideal for someone with deep hands-on experience in GPU profiling, distributed training, and high-performance compute environments. You will be working with Architects from other specialties such as Data engineering, Software engineering, ML engineering to create platforms, solutions and applications that cater to latest trends

You'll play a key role in building out GenAI platform foundations, supporting production-grade deployments, and partnering closely with data science, MLOps, and application teams to bring cutting-edge AI solutions to life.

Key Responsibilities:
  • Design and implement scalable infrastructure for LLM and GenAI workloads across multi-GPU environments
  • Perform GPU profiling, benchmarking, and performance optimization for distributed training workloads
  • Manage and schedule compute-intensive jobs using Slurm-based clusters and OpenShift/Kubernetes environments
  • Enable and optimize the NVIDIA GPU stack (CUDA, cuDNN, NCCL, Triton, RAPIDS, etc.)
  • Collaborate with cross-functional teams to deploy models in research and production environments
  • Build and support GenAI pipelines (fine-tuning, RAG, multi-modal inferencing, LLMOps)
  • Develop reusable infrastructure templates using tools like Terraform and Helm
  • Contribute to internal innovation (PoCs, workshops) and support client-facing delivery engagements
  • Develop and deliver automation software required for building & improving the functionality, reliability, availability, and manageability of applications and cloud platforms
  • Champion and drive the adoption of Infrastructure as Code (IaC) practices and mindset
  • Design, architect, and build self-service, self-healing, synthetic monitoring and alerting platform and tools
  • Automate the development and test automation processes through CI/CD pipeline (Git, Jenkins, SonarQube, Artifactory, Docker containers)
  • Build container hosting-platform using Kubernetes
  • Introduce new cloud technologies, tools; processes to keep innovating in the commerce area to drive greater business value.
  • Lead the technical discussion regarding architecture designing and troubleshooting with the clients and provide solutions proactively as required


Basic Qualifications:
  • Strong experience with Slurm and distributed training environments
  • Hands-on expertise with Red Hat OpenShift and/or Kubernetes
  • Deep knowledge of the NVIDIA GPU ecosystem (CUDA, cuDNN, NCCL, Nsight, Triton/TensorRT)
  • Strong foundation in Linux systems, performance tuning, and multi-GPU optimization
  • Experience deploying GenAI workloads (LLM fine-tuning, RAG pipelines, multi-modal systems)
  • Familiarity with Infrastructure-as-Code tools (Terraform, Ansible)
  • Experience with cloud GPU environments (GCP, Azure, AWS, OCI) and/or on-prem GPU clusters
  • Serve as a mentor or guide for senior resources / team leads.
  • Lead the technical discussion regarding architecture design


Other Qualifications (OQs):
  • Experience with NVIDIA NIMs, DGX systems, or GPU-accelerated containers
  • Knowledge of LLMOps frameworks and MLOps integration
  • Familiarity with vector databases and retrieval systems for RAG architectures
  • Comfortable working in client-facing environments and collaborating with AI solution teams


Healthcare Domain Experience (Nice to Have):
  • Experience working with FHIR R4, HL7 v2, or SMART on FHIR
  • Integration with EHR systems (e.g., Epic)
  • Understanding of HIPAA compliance and healthcare data privacy
  • Exposure to clinical workflows, CDS Hooks, or patient-facing applications
  • Experience building clinical decision support systems or healthcare interoperability solutions


What's in it for YOU at Quantiphi:
  • Make an impact at one of the world's fastest-growing AI-first digital engineering companies.
  • Up-skill and discover your potential as you solve complex challenges in cutting-edge areas of technology alongside passionate, talented colleagues.
  • Work where innovation happens - work with disruptive innovators in a research-focused organization with 60+ patents filed across various disciplines.
  • Stay ahead of the curve, immerse yourself in breakthrough AI, ML, data, and cloud technologies and gain exposure working with Fortune 500 companies.


If you like wild growth and working with happy, enthusiastic over-achievers, you'll enjoy your career with us!

About Quantiphi

Quantiphi is an artificial intelligence and machine learning services company that helps businesses transform their operations through the use of AI. The company provides a range of services, including data engineering, machine learning, computer vision, natural language processing, and predictive analytics. Quantiphi was founded in 2013 and is headquartered in King of Prussia, Pennsylvania.
Learn more about Quantiphi
Size
500 employees
Industry
Founded
2013

Similar Jobs

More Jobs at Quantiphi

More Enterprise Technology Jobs

Find similar Architect - Platform Engineer jobs: