Senior Systems Engineer

Graphcore

$120K — $150K *
Enterprise Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's degree in Electrical Engineering, Computer Engineering, Computer Science, or related discipline.
  • Strong experience with server hardware architectures and board-level debugging.
  • Experience analyzing system logs, hardware telemetry, and power/thermal metrics to isolate hardware failures.
  • Hands-on experience with HPC systems, AI compute platforms, or rack-scale infrastructure.
  • Strong collaboration skills and ability to work effectively in fast-paced engineering environments.
  • Excellent written and verbal communication skills.

Responsibilities

  • Lead advanced break-fix troubleshooting for server blades, motherboards, power systems, and rack-scale infrastructure.
  • Support engineering bring-up activities, including component validation and firmware interaction testing.
  • Diagnose system-level failures involving thermal behavior, power anomalies, network configuration, and BIOS/BMC issues.
  • Collaborate with server engineering teams to perform root cause analysis and propose corrective actions or design improvements.
  • Support deployment and rollout of next-generation hardware platforms through structured validation and qualification cycles.
  • Interface with facilities and infrastructure teams to understand environmental factors impacting system reliability.
  • Develop and maintain standard operating procedures (SOPs), troubleshooting guides, and validation documentation.

Benefits

  • Collaborative work environment fostering rapid problem-solving.
  • Opportunities for mentorship and guidance to junior staff members.
  • Involvement in cutting-edge AI compute system development and deployment.
Full Job Description
Job Summary

We are seeking a Staff Hardware Engineer to provide advanced operational, diagnostic, and engineering support for Graphcore's Arm-based hardware platforms across lab and data center environments.

This role focuses on supporting hardware bring-up, validation, and troubleshooting of complex AI compute platforms, including server blades, racks, and rack-scale infrastructure. The successful candidate will collaborate closely with engineering, platform, and data center teams to ensure the reliability and performance of next-generation AI systems.

The Team

The Systems Engineering and Hardware Engineering teams are responsible for enabling the bring-up, validation, and operational reliability of Graphcore's AI infrastructure platforms.

The team works closely with server engineering, firmware teams, platform architects, and data center operations to support the development, testing, and deployment of next-generation AI compute systems.

This collaborative environment enables rapid problem-solving and continuous improvement of Graphcore's hardware platforms from early development through production deployment.

Responsibilities and Duties
  • Lead advanced break-fix troubleshooting for server blades, motherboards, power systems, and rack-scale infrastructure.
  • Support engineering bring-up activities, including component validation and firmware interaction testing.
  • Diagnose system-level failures involving thermal behavior, power anomalies, network configuration, and BIOS/BMC issues.
  • Collaborate with server engineering teams to perform root cause analysis and propose corrective actions or design improvements.
  • Support deployment and rollout of next-generation hardware platforms through structured validation and qualification cycles.
  • Interface with facilities and infrastructure teams to understand environmental factors impacting system reliability.
  • Develop and maintain standard operating procedures (SOPs), troubleshooting guides, and validation documentation.
  • Provide guidance and mentorship to junior technicians and engineers on troubleshooting methodologies and hardware diagnostics.
  • Participate in on-call rotations or off-hours support during critical engineering milestones or hardware bring-up phases.
Candidate Profile
Essential
  • Bachelor's degree in Electrical Engineering, Computer Engineering, Computer Science, or related discipline.
  • Strong experience with server hardware architectures and board-level debugging.
  • Experience analyzing system logs, hardware telemetry, and power/thermal metrics to isolate hardware failures.
  • Hands-on experience with HPC systems, AI compute platforms, or rack-scale infrastructure.
  • Strong collaboration skills and ability to work effectively in fast-paced engineering environments.
  • Excellent written and verbal communication skills.
Desirable
  • Experience supporting prototype or pre-production hardware bring-up.
  • Familiarity with data center facilities, including liquid cooling and power distribution systems.
  • Experience using Python, Bash, or automation tools for hardware validation or troubleshooting.
  • Exposure to structured failure analysis and reliability engineering methodologies.


USA Benefits
In addition to a competitive salary, Graphcore offers flexible working and a comprehensive benefits package designed to support your health, wellbeing and financial future. Our benefits include medical, dental and vision coverage, Flexible Spending Accounts (FSAs), Health Savings Accounts (HSAs), disability and life insurance, a 401(k) retirement plan, commuter benefits, wellness services and an Employee Assistance Programme (EAP). We welcome people of different backgrounds and experiences; we're committed to building an inclusive work environment that makes Graphcore a great home for everyone.

Similar Jobs

More Jobs at Graphcore

More Enterprise Technology Jobs

Find similar Senior Systems Engineer jobs: