Senior Site Reliability Engineer

Akamai Technologies • $121K — $218K *

Cambridge, MA 02139In-Person

Information Technology

5 - 7 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5+ years of relevant experience in site reliability engineering or related fields.
Bachelor's degree in Computer Engineering, Computer Science, or equivalent.
Proficient in Python for building scalable tools and automation frameworks.
Hands-on experience with monitoring tools like Prometheus, Grafana, and OpenTelemetry.
Understanding of networking principles, including BGP and IPv4/IPv6.
Experience in designing service rollouts and establishing operational readiness criteria.
Ability to manage complex incidents and develop technical runbooks.

Responsibilities

Develop and scale programmatic tooling in Python to automate operational tasks.
Integrate automated workflows across corporate ticketing systems to reduce response times.
Leverage AI utilities to enhance technical execution and system analysis.
Work on private cloud technologies to improve hardware availability and performance.
Design telemetry pipelines and monitoring dashboards for both virtualized and bare-metal environments.
Participate in 24/7 on-call rotations for incident management and service disruptions.
Collaborate with third-party vendors for field technician coordination and uptime activities.

Benefits

Healthcare coverage including mental and financial wellness support.
401K savings plan with company matching contributions.
Generous paid time off including vacation and sick days.
Family-friendly benefits including parental leave.
Employee assistance program available.

Full Job Description

Job Description

Do you enjoy collaborating with teams to solve complex challenges?

Do you enjoy solving large scale distributed content delivery challenges?

Join our critical AI Hardware SRE Team!

The AI Hardware SRE team is responsible for overseeing, scaling, and optimizing our next-generation dedicated AI hardware infrastructure. You will be responsible for ensuring best-in-class uptime and reliability of our AI hardware infrastructure offerings.

Partner with the best

In this role, you'll play a part in pioneering the reliability an elite, high-density hardware and software infrastructure spanning the globe. You'll collaborate with product teams from the earliest stages of development to ensure the reliability, scalability, and performance of our systems. You'll define key performance indicators and defend them when they are breached.

As a Senior Site Reliability Engineer, you will be responsible for:

Developing and scaling robust programmatic tooling and infrastructure-as-code utilities in Python to eliminate operational toil and automate fleet-wide provisioning.
Integrating automated workflows across disconnected corporate ticketing systems to optimize time-to-mitigate metrics for hardware and network break-fix events.
Leveraging advanced AI utilities and LLM-assisted development paradigms where appropriate to accelerate technical execution, script authorship, and system analysis
Working on cutting-edge private cloud and compute technologies to improve the availability, latency, and overall systemic health of high-density hardware environments.
Designing and implementing telemetry pipelines, custom Prometheus/Grafana monitoring dashboards, and AI-based anomaly detection tailored for bare-metal and virtualized environments.
Participating in 24x7x365 on-call rotations, spearheading real-time incident management, and managing high-severity service disruption protocols via automated PagerDuty and Slack workflows.
Partnering directly with third-party infrastructure vendors and coordinating on-site field technicians to facilitate uptime activities.

Do what you love

To be successful in this role you will:

Have 5 years of relevant experience and a Bachelor's degree in Computer Engineering, Computer Science or equivalent
Possess tooling and coding ability in languages like Python to construct scalable operational tools, API integrations, and automation frameworks.
Show hands-on experience with modern observability stacks and timeseries engines, like Prometheus, Grafana, OpenTelemetry, and Loki.
Possess a working understanding of advanced networking topologies, high-bandwidth routing/switching infrastructure, BGP, and dual-stack IPv4/IPv6 networks.
Have experience acting as a key designer for new service rollouts, including establishing operational readiness criteria, telemetry baselines, and alerting thresholds.
Demonstrate extensive experience building technical runbooks, leading complex incident response bridges, and driving comprehensive, blameless post-mortems.
Display a proven ability to take absolute ownership of ambiguous technical problems, coordinate cross-functional teams, and drive for production-grade solutions.

Compensation

Akamai is committed to fair and equitable compensation practices. For US based candidates only - the base salary for this position ranges from $121,400 - $218,600/year; a candidate's salary is determined by various factors including, but not limited to, relevant work experience, skills, certifications and location. Compensation for candidates outside the US will vary. The compensation package may also include incentive compensation opportunities in the form of annual bonus or incentives, equity awards and an Employee Stock Purchase Plan (ESPP). Akamai provides industry-leading benefits including healthcare, 401K savings plan, company holidays, vacation (in the form of PTO), sick time, family friendly benefits including parental leave and an employee assistance program including a focus on mental and financial wellness; Eligibility requirements apply.

About Akamai Technologies

Akamai Technologies, Inc. is a global content delivery network (CDN), cybersecurity, and cloud service company. The company provides web and mobile performance solutions, cloud security solutions, enterprise access solutions, and video delivery solutions. Akamai was founded in 1998 and is headquartered in Cambridge, Massachusetts. The company serves a wide range of industries, including media and entertainment, gaming, software, financial services, healthcare, and others. Akamai is publicly traded on the NASDAQ stock exchange under the ticker symbol AKAM.

Learn more about Akamai Technologies

Size

8,700 employees

Market Cap

$13 billion

Industry

Information Technology

Net Income

$557 million

Founded

1998

5 Year Trend

+8.1%

Revenue

$3.1 billion

NASDAQ

AKAM

* Ladders Estimates

Similar Jobs

AIOps Lead, Software Engineering
$169K — $213K *
Performant Financial
Morristown, NJ 07960 (Morris County)
4 days ago
AIOps Lead, Software Engineering
$169K — $213K *
Performant Financial
Boston, MA 02115 (Suffolk County)
4 days ago
AIOps Lead, Software Engineering
$169K — $213K *
Performant Financial
Remote
4 days ago
Senior Site Reliability Engineer (In-Office Required)
$156K — $262K *
Nebius
New York, NY 10025 (New York County)
Reposted 4 days ago
Senior Software Engineer, EngOps
$120K — $150K *
StackAdapt
Florida, NY 10921 (Orange County)
4 days ago
Senior Software Engineer, EngOps
$120K — $150K *
StackAdapt
New York, NY 10025 (New York County)
4 days ago

Get Ready For Your
Next Interview

More Jobs at Akamai Technologies

Sales Development Representative
$67K — $121K *
Cambridge, MA 02139 (Middlesex County)
Today
Enterprise Technology
In-Person
Senior Site Reliability Engineer
$121K — $218K *
Cambridge, MA 02139 (Middlesex County)
Today
Information Technology
In-Person
Platform Operations Engineer - 8:30 PM - 8:30 AM EST Office Shift
$60K — $108K *
Cambridge, MA 02139 (Middlesex County)
Today
Information Technology
In-Person
Senior Strategic Account Executive, Existing Accounts - Northeast Region
$245K — $441K *
Cambridge, MA 02139 (Middlesex County)
3 days ago
Finance & Insurance
In-Person
Senior Product Manager (Technical)
$139K — $250K *
Cambridge, MA 02139 (Middlesex County)
4 days ago
Enterprise Technology
In-Person

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
2 weeks ago
Oracle Database Engineer (Remote)
$175K — $195K *
GovCIO
Remote
Today
Senior Application Engineer (Remote)
$175K — $200K *
GovCIO
Remote
Today
Technical Lead (Remote)
$160K — $165K *
GovCIO
Remote
Today
Azure DevOps Engineer / Terraform (Remote)
$125K — $145K *
GovCIO
Remote
Today

Find similar Senior Site Reliability Engineer jobs:

Nationwide Cambridge, MA

Senior Site Reliability Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Senior Site Reliability Engineer jobs:

Get Ready For Your
Next Interview