Senior Site Reliability Engineer

Walmart, Inc.

• $112K — $180K *

Dallas, TX 75217In-Person

Information Technology

Less than 5 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

Master's degree in Computer Science or related field with 1 year of experience, or Bachelor's degree with 3 years of experience.
Proficiency in managing and orchestrating Kubernetes clusters with helm charts.
Experience with server management in AWS and orchestration tools like Ansible and Terraform.
Familiarity with building scalable monitoring systems using CloudWatch, Grafana, and PRTG.
Skilled in managing RDBMS (PostgreSQL, MSSQL) and non-RDBMS (Redshift, MongoDB) databases.

Responsibilities

Assist in creating functional product designs following requirements.
Evaluate trade-offs in design across various components based on business needs.
Convert high-level designs into detailed functional logic and mock screens.
Conduct infrastructure coding automation and coding guidelines compliance.
Identify automation opportunities in CI/CD and testing processes.
Develop telemetry features and apply security policies during coding.
Monitor site reliability conditions and propose improvements or metrics.

Benefits

Comprehensive medical, vision, and dental coverage.
401(k) plan with company contributions.
Paid time off including sick leave, parental leave, and family care leave.
Education assistance for college degrees fully funded by the company.
Short-term and long-term disability coverage.

Full Job Description

What you'll do...

Position: Senior Site Reliability Engineer

Job Location: 14901 Quorum Drive, Dallas, TX 75254

Duties: Assist in creating simple, modular, extensible, and functional design for the product/solution in adherence to the requirements. Evaluate trade-offs while designing across multiple components in a product based on business requirements. Convert HLD to create detailed design using mock screens, pseudo codes, and detailed functional logic of the modules for specific modules and components of a product/system. Understand nuances of designing for disaster recovery. Design and create MVP to clarify requirements and design and uncover risks. Refine the MVP design for early defects and revised customer requirements. Undertake infrastructure coding automation. Adhere to all relevant coding guidelines while writing/configuring code. Create/configure minimalistic (less complex, highly robust, and high quality) code for a component/module under guidance. Maintain records by documenting program development and revisions. Stay updated on the prevalent coding languages and frameworks in the industry outside the immediate scope of delivery. Identify repetitive and routine tasks in (Continuous Integration/Continuous Delivery) CI/CD, testing, or any other process that can be automated. Implement telemetry features as required under guidance. Apply security policy requirements to component/module during code development/configuration. Detect and document defects, bugs, and errors for assigned component/module and conduct analysis to determine the sources under guidance. Troubleshoot performance and availability bottlenecks for assigned application under guidance. Work with business partners to identify and document critical applications. Interpret and follow procedures in contingency plans. Explain the contingency and disaster recovery plans for assigned environment. Execute established procedures necessary to continue operations in an emergency. Participate in the design of a minimum operating environment for a computer-based facility. Utilize established criteria (for example, probability of failure, frequency of failure) to measure site reliability. Monitor site reliability conditions and new reliability requirements. Assist in the design and development of a reliability program plan for a specific site environment. Apply appropriate tools, services, or applications for reliability prediction and other site improvements. Research and assess various reliability models for different site environments. Suggest metrics to monitor software or system performance. Monitor current performance data to ensure compliance with defined SLOs for multiple applications/systems. Determine thresholds for monitoring metrics and triggers alerts based on thresholds. Help with specific procedures to proactively check the health of applications and infrastructure, including a variety of operating systems, hardware, and software. Make recommendations regarding situational awareness and alerting. Make recommendations regarding instrumentation gaps and alerting logic, including a variety of operating systems, hardware, and software.

Minimum education and experience required: Master's degree or the equivalent in Computer Science, Computer Engineering, Computer Information Systems, Software Engineering, Electrical Engineering, or related area and 1 year of experience in site reliability engineering, site and system administration, infrastructure management, or related area; OR Bachelor's degree or the equivalent in Computer Science, Computer Engineering, Computer Information Systems, Software Engineering, Electrical Engineering, or related area and 3 years of experience in site reliability engineering, site and system administration, infrastructure management, or related area.

Skills required: Experience with the management and orchestration of Kubernetes cluster with helm charts. Experience with networking solutions including VPN systems, firewall technologies, and storage systems. Experience building scalable monitoring and observability systems using CloudWatch, PRTG, Grafana, and PagerDuty. Experience with server management in AWS with orchestration tools, including Ansible, Puppet, and Terraform. Experience managing DNS and SSL certificates in AWS. Experience managing Enterprise Workloads in an AWS Infrastructure. Experience building CI/CD pipelines using GitHub Action, CodeBuild, CodePipeline, and CircleCI. Experience managing RDBMS including PostgreSQL and MSSQL Server and non-RDBMS including Redshift and MongoDB. Experience writing unit and integration tests. Experience with tool development, including scripting with BASH and high level languages: Python and Typescript. Employer will accept any amount of experience with the required skills.

Salary Range: $112,923/year to $180,000/year. Additional compensation includes annual or quarterly performance incentives.

Benefits: At Walmart, we offer competitive pay as well as performance-based incentive awards and other great benefits for a happier mind, body, and wallet. Health benefits include medical, vision and dental coverage. Financial benefits include 401(k), stock purchase and company-paid life insurance. Paid time off benefits include PTO (including sick leave), parental leave, family care leave, bereavement, jury duty and voting. Other benefits include short-term and long-term disability, education assistance with 100% company paid college degrees, company discounts, military service pay, adoption expense reimbursement, and more.

Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to a specific plan or program terms. For information about benefits and eligibility, see One.Walmart.com.

* Ladders Estimates

Similar Jobs

Lead Systems Integrator
$90K — $130K *
Remote
Reposted Today
Senior Emulation Methodology Engineer
$120K — $160K *
Advanced Micro Devices, Inc
Austin, TX 78745 (Travis County)
Reposted Today
Staff Engineer I
$90K — $120K *
Western Alliance Bancorporation
Dallas, TX 75217 (Dallas County)
Today
Platform Engineer
$120K — $150K *
Virtasant
Austin, TX 78745 (Travis County)
Reposted Today
Senior Site Reliability Engineer
$120K — $150K *
Ellucian
Remote
Reposted Today
Senior Client Platform Engineer, Windows
$159K — $215K *
Dropbox
Remote
Today

Get Ready For Your
Next Interview

More Jobs at Walmart, Inc.

(USA) Sanitation Manager
$65K — $98K *
North Platte, NE 69101 (Lincoln County)
Reposted Today
Food & Beverages
In-Person
Senior Analyst, Account Strategy
$91K — $169K *
San Bruno, CA 94066 (San Mateo County)
Today
Business Services
In-Person
(USA) Director, Operations and Implementation
$110K — $220K *
Bentonville, AR 72712 (Benton County)
Today
Business Services
In-Person
Staff Pharmacist
$98K — $172K *
Prairie Du Chien, WI 53821 (Crawford County)
Today
Healthcare
In-Person
(USA) Senior, Software Engineer
$117K — $234K *
Sunnyvale, CA 94087 (Santa Clara County)
Today
Enterprise Technology
In-Person

More Information Technology Jobs

Senior Software Engineer, Spark Platform
$130K — $285K *
DoorDash
Seattle, WA 98115 (King County)
Today
Supervisory IT SPECIALIST (GROUP MANAGER)
$90K — $120K *
Office of the Chief Information Officer
Kansas City, MO 64118 (Clay County)
Today
Security Specialist
$70K — $95K *
LCI Communications
Alexandria, VA 22304 (Alexandria City County)
Today
Manager, IT Support
$75K — $95K *
Continental Services
Sterling Heights, MI 48310 (Macomb County)
Reposted Today
Software Engineer
$80K — $110K *
Kuka AG
Atlanta, GA 30349 (Fulton County)
Today

Find similar Senior Site Reliability Engineer jobs:

Nationwide Dallas, TX

Senior Site Reliability Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Senior Site Reliability Engineer jobs:

Get Ready For Your
Next Interview