Perficient

Lead Engineer Network Operations Center

Perficient$73K — $170K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 5+ years experience in Production Services, SRE, NOC, or similar roles.
  • Expertise in ITSM principles and enterprise incident management processes.
  • Strong hands-on experience with Dynatrace for monitoring and alerting.
  • Proficient in troubleshooting Windows and Linux server environments.
  • Operational experience with AWS services and understanding of cloud failure modes.
  • Ability to thrive under pressure in high-severity incident environments.
  • Proven track record of creating and maintaining operational runbooks.

Responsibilities

  • Lead deep-dive triage during incidents and coordinate restoration activities.
  • Enhance alert quality by tuning Dynatrace settings and defining thresholds.
  • Create and maintain operational runbooks and incident playbooks.
  • Develop dashboards and observability features to improve incident response time.
  • Troubleshoot production issues in AWS and Linux/Windows environments.
  • Participate in root cause analysis and ensure preventive measures are implemented.
  • Monitor incident trends to identify risks and suggest reliability improvements.

Benefits

  • Comprehensive health, dental, and vision insurance.
  • 401(k) plan with company matching contributions.
  • Generous paid time off and flexible work arrangements.
  • Investment in professional development and training opportunities.
  • Access to the latest tools and technologies in a collaborative environment.
Full Job Description
Job Description

We currently have a career opportunity for a Lead Engineer, Network Operations Center, to join our team located in Charlotte, NC.

Job Overview:

We are seeking a Lead Engineer, Network Operations Center (NOC) to join a large Managed Services program supporting enterprise clients. In this role, you will serve as a hands-on technical leader responsible for real-time ("eyes-on-glass") monitoring, enhancing monitoring and alerting quality, accelerating incident triage and restoration, and strengthening operational readiness across critical platforms.

You will leverage industry leading observability tools, such as Dynatrace to build actionable dashboards and high-fidelity alerts. Additionally, you will partner closely with Site Reliability Engineering (SRE), Observability, Automation, and application/infrastructure teams to prevent incidents, reduce alert noise, and improve service resilience across AWS and Windows/Linux production environments, following ITSM best practices.

This is an exciting opportunity to deliver measurable improvements in system stability and user experience through advanced observability and incident response engineering. If you are passionate about production operations, reliability, and building pragmatic solutions at scale, we encourage you to apply and help raise the bar for resilience and operational efficiency.

Responsibilities

  • Incident Response Engineering: Serve as a senior technical escalation point during incident lead deep-dive triage, coordinate technical containment, and drive restoration activities with domain teams.
  • Monitoring & Alert Quality (Dynatrace): Improve signal-to-noise by tuning Dynatrace alerts, defining actionable thresholds, and implementing routing/deduplication so responders receive the right alerts at the right time.
  • Runbooks & Playbooks: Author and maintain operational runbooks and incident playbooks in partnership with service owners; ensure they are accurate, testable, and used in practice.
  • Observability Enablement: Build and enhance Dynatrace dashboards, eyes-on-glass views, and diagnostics (logs/metrics/traces) to shorten time-to-detect and time-to-diagnose for critical services.
  • Platform Troubleshooting: Troubleshoot production issues across AWS and Windows/Linux environments (compute, networking, storage, OS/application services) and engage the right domain teams with evidence-based hypotheses.
  • RCA Support: Contribute to and/or lead technical root cause analysis for significant or repeat incidents; ensure learnings translate into durable fixes and prevention actions.
  • Trend Analysis: Analyze incident and alert trends to surface systemic risks, recurring failure modes, and prioritized reliability improvements.
  • Stakeholder Updates: Provide clear, timely technical updates during incidents and post-incident reviews; communicate impact, progress, risks, and next steps.
  • Operational Readiness: Support operational readiness reviews for new services and major changes (monitoring coverage, SLOs/SLIs, runbooks, rollback plans).
  • Mentorship: Mentor engineers and analysts on troubleshooting approaches, observability practices, and incident response fundamentals.


Qualifications

  • 5+ years of progressive experience in Production Services, SRE/Operations, NOC/Command Center, or related reliability/operations engineering roles.
  • ITSM: Working knowledge of ITSM principles (incident, problem, and change management) and experience operating within an enterprise incident management process and tooling.
  • Dynatrace: Strong hands-on experience building and operating Dynatrace dashboards, alerts, and diagnostics to support eyes-on-glass monitoring and rapid troubleshooting in production.
  • Windows & Linux: Strong troubleshooting skills in Windows and Linux server environments (services, performance, logs, networking fundamentals).
  • AWS: Operational experience supporting workloads in AWS (e.g., EC2, ALB/NLB, RDS, CloudWatch/integrations, IAM basics) and understanding cloud failure modes.
  • Comfort operating in a fast-paced, high-severity incident environment-able to prioritize, stay calm, and communicate clearly under pressure.
  • Experience creating and maintaining runbooks/playbooks and using them during real incidents; comfortable leading technical triage under time pressure.
  • Strong understanding of key technology components and architectural principles across cloud, databases, networking, systems, and applications.
  • Demonstrated troubleshooting and systems thinking skills-able to isolate failures, validate hypotheses, and drive to resolution.
  • Excellent communication and interpersonal skills, with the ability to collaborate effectively across engineering and leadership teams.
  • Demonstrated ability to leverage AI tools to enhance productivity, streamline workflows, and support data-informed task execution.
  • Familiarity with AI-enhanced platforms is a plus.
  • A solid understanding of AI capabilities and limitations including ethical considerations is expected.
  • Ability to influence without authority and drive changes that improve reliability and operational outcomes.
  • Analytical mindset with the ability to translate operational data into actionable insights and prioritized improvements.
  • Financial services or FinTech experience would be considered a plus
  • Demonstrated success collaborating with globally distributed teams in complex enterprise environments.
  • Strong client-facing or consulting background, with experience driving outcomes in customer-facing engagements.
  • Demonstrated success collaborating with globally distributed teams in complex enterprise environments.
  • Strong client-facing or consulting background, with experience driving outcomes in customer-facing engagements.

About the team:

Our Product Development - Custom Modernization team helps businesses transform legacy systems and build future-ready applications. We deliver end-to-end solutions-combining cloud migration, custom application development, multi-cloud strategies, and modern UI and API integration. With expertise in DevSecOps, modern frameworks, and enterprise platforms, our team of engineers, architects, and project leaders partner with leading brands to drive innovation, accelerate delivery, and create lasting business impact. We also integrate AI-driven capabilities-such as intelligent automation, predictive analytics, and generative development tools-to enhance scalability, performance, and user experience.

Applications will be accepted until the position is filled or the posting is removed.

The salary range for this position takes into consideration a variety of factors, including but not limited to skill sets, level of experience, applicable office location, training, licensure and certifications, and other business and organizational needs. The new hire salary range displays the minimum and maximum salary targets for this position across all US locations, and the range has not been adjusted for any specific state differentials. It is not typical for a candidate to be hired at or near the top of the range for their role, and compensation decisions are dependent on the unique facts and circumstances regarding each candidate. A reasonable estimate of the current salary range for this position is $73,008.00 to $170,640.00. Please note that the salary range posted reflects the base salary only and does not include benefits or any potential variable compensation programs. Information regarding the benefits available for this position are in our benefits overview.

Disclaimer: The above statements are not intended to be a complete statement of job content, rather to act as a guide to the essential functions performed by the employee assigned to this classification. Management retains the discretion to add or change the duties of the position at any time.

#LI-GS1. #LI-AIFirst

About Perficient

Perficient is a leading digital consultancy that helps companies transform their businesses and operations through technology. They deliver solutions to clients that range from Fortune 500 companies to emerging businesses. Perficient has a broad range of capabilities, including strategy, design, technology, and operations. They have expertise in a variety of industries, including healthcare, financial services, retail, and energy. Perficient has been recognized as a top employer and a top company for women technologists. They are committed to giving back to their communities through philanthropy and volunteerism.
Learn more about Perficient
Size
6,079 employees
Market Cap
$2.4 billion
Industry
Net Income
$30.1 million
Founded
1998
5 Year Trend
+9.3%
Revenue
$612.1 million
NASDAQ

Similar Jobs

More Jobs at Perficient

More Information Technology Jobs

Find similar Lead Engineer Network Operations Center jobs: