Cloud Site Reliability Engineer

Stefanini • $177K — $187K *

Dallas, TX 75217In-Person

Information Technology

5 - 7 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

Bachelor's degree in computer science, Information Systems, or equivalent experience.
7+ years in software development focusing on reliability and platform engineering.
5+ years of advanced Python development for enterprise-grade tools and APIs.
3+ years in AWS environments with strong knowledge of core services and cost optimization.
3+ years applying SRE principles including observability and automation.
Expert in Infrastructure as Code (IaC) using Terraform, including module development.
Strong experience with CI/CD pipelines and automated testing frameworks.

Responsibilities

Design, develop, and maintain reliability solutions and SRE utilities.
Build and optimize Infrastructure as Code (IaC) using Terraform.
Develop CI/CD pipelines and automated testing for code quality.
Define SRE standards and establish metrics like SLI and SLOs.
Apply software engineering best practices in all development tasks.
Participate in incident management and on-call support.
Stay current with emerging AWS services and SRE methodologies.
Collaborate within Agile frameworks for integrated cloud automation solutions.

Benefits

Opportunities for remote and hybrid work arrangements
Commitment to developing long-term employee relationships
Engagement with teams for constructive and transparent communication
Exposure to innovative technologies and practices in cloud engineering
Support for professional development and growth within the company

Full Job Description

As a Senior Cloud Engineer in the Cloud SRE team, you will be responsible for designing and developing cloud solutions and engineering reliability tools for the Cloud Foundation Services (CFS) platform in the Infrastructure, Platforms & Operations organization. You will apply software engineering practices to build scalable, reusable solutions and utilities that enhance platform reliability.

Responsibilities:

What Will Be Expected of You:

Design, develop, and maintain reliability solutions and SRE utilities to reduce toil, improve cloud platform reliability, and industrialize SRE practices across the system
Build and optimize Infrastructure as Code (IaC) using Terraform to manage AWS resources related to SRE solutions, incorporating cost-efficient design principles
Develop CI/CD pipelines and automated testing to ensure code quality, reliability, and rapid delivery of the solutions
Define SRE standards, best practices, and guidelines for adoption across teams; establish SRE metrics like SLI, SLOs, etc.
Apply software engineering best practices including version control, code reviews, test-driven development, and documentation to all development
Participate in incident management and on-call rotation, providing technical support for SRE tools, troubleshooting production issues, and collaborating with teams to reduce incident recurrence through proactive detection and pattern analysis
Stay current with emerging AWS services, SRE methodologies, and cloud-native development technologies, and drive adoption of innovative solutions
Collaborate within Agile and Scaled Agile frameworks with cross-functional teams to deliver integrated cloud automation solutions
Produce clear, blameless postmortems with actionable items and documented failure scenarios

#LI-SS3

#LI-HYBRID

Job Requirements

Details:

Qualifications:

Bachelor's degree in computer science, Information Systems, or equivalent background or equivalent experience
7+ years of extensive experience in software development with focus on reliability and platform engineering
5+ Years of advanced Python development skills with proven experience building enterprise-grade, highly available tools, APIs, and utilities
3+ years of hands-on experience developing solutions in AWS environments with deep understanding of core services (EC2, VPC, S3, Lambda, IAM, CloudFormation, EventBridge, Step Functions etc.) and resource cost optimization
3+ years of experience applying SRE principles including observability, toil automation, SLIs/SLOs and reliability engineering
Expert-level proficiency with Infrastructure as Code (IaC) using Terraform, including module development and state management
Strong experience with CI/CD pipelines, automated testing frameworks, and DevOps practices
Experience with observability tools and practices including Grafana, AWS CloudWatch, AWS Canary
Experience defining, implementing, and managing SLOs/SLIs and error budgets; familiarity with conducting RCAs and producing postmortem documentation
Working experience in Agile and Scaled Agile environments and familiarity with ITSM processes (incident, change, and problem management), resilience testing and chaos engineering practices
Experience with GoLang or additional programming languages is a plus

Stefanini takes pride in hiring top talent and developing relationships with our future employees. Our talent acquisition teams will never make an offer of employment without having a phone conversation with you. Those face-to-face conversations will involve a description of the job for which you have applied. We also speak with you about the process including interviews and job offers.

Pay Range:

$ 85.00 - $ 90.00

* Ladders Estimates

Similar Jobs

Engineer III - Data Analytics (Hybrid)
$120K — $180K *
CrowdStrike Holdings, Inc.
Remote
Reposted 3 days ago
Wireless Network Development Engineer, OTIE Wireless Engineering
$136K — $184K *
Amazon
Austin, TX 78745 (Travis County)
1 week ago
Lead Azure Engineer (RapidScale)
$122K — $204K *
Cox Enterprises
Remote
Reposted 1 week ago
Sr. Software Development Engineer - Silicon Development Infrastructure
$168K — $227K *
Amazon
Austin, TX 78745 (Travis County)
1 week ago
Infrastructure Engineer
$165K — $200K *
Roboflow, Inc
Remote
Reposted 1 week ago
Platform Engineer, Product Team
$137K — $244K *
ManTech International
Remote
Reposted 1 week ago

Get Ready For Your
Next Interview

More Jobs at Stefanini

Cloud Site Reliability Engineer
$177K — $187K *
Dallas, TX 75217 (Dallas County)
Today
Information Technology
In-Person
Power Platform Support Specialist
$83K — $93K *
Dearborn, MI 48126 (Wayne County)
Today
Information Technology
In-Person
Software Engineer
$126K — $137K *
Allen Park, MI 48101 (Wayne County)
Yesterday
Information Technology
In-Person
Software Tester
$108K — $118K *
Allen Park, MI 48101 (Wayne County)
Yesterday
Consumer Technology
In-Person
Sr. Mechanical Engineer
$80K — $110K *
Greensboro, NC 27406 (Guilford County)
Yesterday
Manufacturing & Automotive
In-Person

More Information Technology Jobs

Client Partner - Banking / Financial Services / Capital Markets
$325K — $350K + $100K bonus *
Large IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
3 days ago
Business Development Director
$300K — $345K + $120K bonus *
Tier1 IT Services Firm
Kansas City, MO 64116 (Clay County)
1 week ago
Client Partner / Business Developemnt - Banking
$250K — $320K + $70K bonus *
IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
1 week ago
Associate Director, Information Technology
$120K — $150K *
Silicon Valley Bank
Dallas, TX 75217 (Dallas County)
Today
Director, IT Strategic Sourcing
$130K — $180K *
Nidec Automatic Feed
Chicago, IL 60629 (Cook County)
Today

Find similar Cloud Site Reliability Engineer jobs:

Nationwide Dallas, TX

Cloud Site Reliability Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Cloud Site Reliability Engineer jobs:

Get Ready For Your
Next Interview