HPC Network Engineer

Compunnel

$80K — $130K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's degree in Computer Science, IT, or related field.
  • Proficiency in RoCE protocols, especially RoCEv2.
  • Experience in designing and configuring high-performance RoCE-enabled networks.
  • Strong skills in performance tuning, congestion management, and network optimization.
  • Familiarity with security measures for RDMA traffic.
  • Hands-on experience with HPC environments and data center networks.
  • Proficiency in network monitoring and troubleshooting tools.

Responsibilities

  • Design and configure RoCE networks, including switches, adapters, and Ethernet fabrics.
  • Optimize network settings for peak performance including MTU and buffer sizes.
  • Implement congestion management mechanisms for efficient data flow.
  • Monitor and tune network performance with specialized tools.
  • Implement security protocols to safeguard RDMA traffic.
  • Coordinate with vendors for compatibility and support.
  • Collaborate with teams for cloud migration and support RDMA applications.

Benefits

  • Work in a cutting-edge technology environment focused on HPC.
  • Opportunity for professional growth and technical mastery.
  • Engage with leading vendors and state-of-the-art network technologies.
Full Job Description
Job Summary

We are seeking a highly skilled HPC Network Engineer to design, deploy, and optimize high-performance computing (HPC) clusters with a focus on RoCE (RDMA over Converged Ethernet) technologies.

The ideal candidate will lead efforts in network configuration, performance tuning, security implementation, and vendor coordination to support low-latency, high-bandwidth communication across HPC environments.

Key Responsibilities

RoCE Network Design and Optimization
  • Design and configure RoCE networks including switches, adapters, and Ethernet fabrics.
  • Optimize network settings such as MTU, buffer sizes, and flow control parameters for peak performance.
  • Implement congestion management mechanisms like Priority Flow Control (PFC) and Data Center Bridging (DCB).
  • Configure RoCE-aware switches and routers for efficient RDMA traffic routing.
  • Monitor and tune network performance using tools like Ethernet Performance Monitoring (EPM) and InfiniBand Performance Monitoring (IPM).


Security and Compliance
  • Implement security protocols such as MACsec and IPsec to secure RDMA traffic.
  • Enforce access controls and certificate-based authentication for secure endpoint communication.


Vendor Management
  • Coordinate with hardware/software vendors to ensure compatibility and support.
  • Define technical requirements and evaluate vendor solutions through PoCs.
  • Maintain regular communication with vendors for updates, issue resolution, and performance reviews.


Collaboration and Support
  • Work with cross-functional teams to support cloud migration and lifecycle management.
  • Lead troubleshooting efforts and resolve complex network configuration issues.
  • Support RDMA-enabled applications and parallel computing frameworks (e.g., MPI, OpenMP).


Required Qualifications
  • Bachelor's degree in Computer Science, Information Technology, or a related field.
  • Proficiency in RoCE protocols including RoCEv2.
  • Experience designing and configuring high-performance RoCE-enabled networks.
  • Strong skills in performance tuning, congestion management, and network optimization.
  • Familiarity with security measures for RDMA traffic and access control mechanisms.
  • Hands-on experience in deploying and managing HPC environments and data center networks.
  • Proficiency in network monitoring and troubleshooting tools.


Preferred Qualifications
  • Advanced degree in a technical field or equivalent practical experience.
  • Certifications such as CCIE or vendor-specific RoCE certifications.
  • Experience with network equipment from vendors like Juniper, Cisco, Arista.
  • Working knowledge of firewalls (stateful/stateless) and Linux/UNIX systems.
  • Scripting experience with Python or Ansible for automation.
  • Familiarity with DevOps practices and CI/CD pipelines.


Certifications
  • CCIE or equivalent vendor certifications (preferred).


Education: Bachelors Degree

Certification: RoCE Certification , Cisco Certified Internetwork Expert

Similar Jobs

More Jobs at Compunnel

More Information Technology Jobs

Find similar HPC Network Engineer jobs: