THE POSITION:STACK is looking for an Infrastructure Reliability Engineer with subject matter expertise in electrical systems who will act as a key member of STACK's Critical Operations team. This position will play a vital role in ensuring the ongoing performance, resiliency, and evolution of infrastructure systems across STACK's portfolio. This role requires deep technical fluency in data center power and cooling systems, a forensic mindset for failure analysis, and a proactive approach to risk reduction. Responsibilities include but are not limited to:
- Lead deep-dive investigations and RCAs for electrical infrastructure failures, including UPS systems, switchgear, breakers, relays, generators, grounding systems, STS behavior, VFD interactions, controls, and power quality disturbances
- Evaluate electrical system performance under abnormal, fault, or degraded conditions (e.g., grounding faults, harmonics, transient events, protective coordination, voltage distortion, transfer events) to identify systemic vulnerabilities
- Engage OEMs and vendors to challenge technical assumptions and advocate for long-term improvements
- Support the evolution of maintenance standards and asset strategy for high-risk or complex systems (e.g., power distribution, cooling)
- Collaborate with Workforce Development to enhance technical training for site teams based on lessons from event investigations
- Contribute to availability reporting, event response improvement, and risk trend monitoring to ensure SLA commitments are met
- Inform and influence the design review and turnover process by identifying gaps in infrastructure handoffs, system limitations, or commissioning practices
- Develop system-level failure mode mitigation strategies that improve uptime performance and reduce repeat incidents
- Partner with Operations, Engineering and Construction to review electrical design assumptions, protective schemes, equipment compatibility, and commissioning practices to identify long-term reliability risks prior to or following operational events
THE DETAILS:- Location: Manassas or Sterling, VA, Portland, OR, Chicago (CHI), or Dallas-Fort Worth (DFW)
- Benefits: Healthcare, Dental Care, Vision Insurance, Life Insurance, Paid Time Off, Paid Leave Programs
- Travel: 25% domestically
- Must be eligible to work in the United States
- Must pass a comprehensive background screening
MUST-HAVE QUALIFICATIONS:- 5-8 years of experience in critical infrastructure environments (e.g., data centers, substations, power generation, or utility systems)
- Strong technical fluency in mission-critical electrical systems, including power distribution architecture, UPS systems, generators, grounding methodologies, protective relays, switchgear, controls integration, and power quality analysis
- Experience analyzing electrical failures through waveform data, event logs, relay coordination, commissioning findings, or forensic troubleshooting
- Working knowledge of electrical system design intent versus operational field realities, including maintainability, equipment compatibility, and fault response
- Hands-on experience with root cause analysis and reliability methodologies (e.g., FMEA, RCM)
- Demonstrated ability to work across disciplines (Ops, Eng, Vendors, Construction) to resolve complex technical issues
- Expertise with commissioning (Cx) and infrastructure design review processes
- Ability to analyze performance data and translate findings into practical improvements
- Bachelor's degree in Engineering or equivalent experience with high technical competency
THIS MIGHT BE RIGHT FOR YOU IF:- You are a strong communicator, you are persuasive and clear, blending analytics with experience in decision-making.
- You do not get flustered easily. You can juggle multiple priorities while balancing urgent requests with shifting timelines and deliverables.
- You are a team builder. You take the time to understand and develop the strengths of your resources while formulating long-term plans for the growth and success of the team.
- You are naturally curious and driven toward continual improvement. While you celebrate your successes, you take time to review and analyze campaigns for future learning.