Site Reliability Eng

TenTek Inc

$120K — $160K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 7+ years of experience in technical operations and engineering, with a strong focus on systems reliability.
  • Expert knowledge of both Linux and Windows operating systems.
  • Proficiency with CI/CD platforms such as GitHub Actions and GitLab CI.
  • Hands-on experience with cloud platforms like AWS, Google Cloud, or Azure.
  • Familiarity with Infrastructure as Code tools like Terraform or CloudFormation.

Responsibilities

  • Design and lead architectural planning meetings with various teams.
  • Build and integrate automated systems and deployment tools for software solutions.
  • Monitor system performance and application stability to ensure operational excellence.
  • Collaborate closely with Imagineering Technology Studio teams for requirement gathering and troubleshooting.
  • Create automation scripts and documentation to streamline operational processes.

Benefits

  • Collaborative and high-energy work environment.
  • Opportunity to work on cutting-edge technology in theme parks and resorts.
  • Support for continuous learning and professional development.
  • Engagement in creative and innovative projects that enhance guest experiences.
Full Job Description
The Systems Reliability Engineering (SRE) team helps Imagineers create and deliver the

software solutions that power experiences in our theme parks and resorts.

Systems Reliability Engineers use a software engineering approach to architect, design, automate,

monitor, and build applications at scale. This includes operating and engineering software with close

business segment alignment to deliver platforms through efficient, effective and resilient architectures.

SREs are talented engineers that are focused on improving quality through a data driven approach:

instrumentation, automation, and functional/unit testing.

his position is for an experienced systems engineer eager to play an integral role on the Systems Reliability Engineering team for The Walt Disney

Company supporting Imagineering to help create, build and deliver amazing digital experiences to our guests. Primary responsibilities include

designing, building, and supporting automated build and deployment systems, platforms and cloud environments that will be used to assemble

and deliver experiences to our Park and online guests.

The Senior Systems Engineer is expected to have expert level systems administration skills on both the Linux and Windows platforms, and must

have experience with CI/CD platforms (GitHub Actions, GitLab CI)), systems automation (Chef/Ansible/Terraform), systems development (Go,

Python, Ruby) and cloud automation tools (Boto, CloudFormation, Terraform), source control, cloud hosting, container computing, web

technologies and the DevOps team culture. This position will also bring expertise on systems, operational excellence and application stability,

security, performance, and capacity management, as well as documentation.

This position works closely with Imagineering Technology Studio teams to brainstorm, architect, gather requirements, troubleshoot, and provide

stellar customer support. The role requires someone who is creative, proactive, constructive, and highly motivated. The Senior Systems Engineer

must be prepared to work in an extremely collaborative and high-energy environment.

Job Responsibilities and Duties:

Summarize job responsibilities and major duties. What duties are required for the position to exist?

-Focus on major areas of work, typically 20% or more of role

-An ideal list would have 3-5 major responsibilities/duties

-Estimate and include percentage of time spent in each, and whether performed (D) Daily, (W) Weekly, (M)

Monthly or (A) Annually

Design: Leading project/planning efforts, architectural design, engineering, attending meetings w/ various teams.

Build: Implementing, integrating and configuring solutions, tools, infrastructure and systems.

Basic Qualifications Understand how to install and configure operating systems, specifically with

expertise in Linux and Windows Server.

? Software Development Continuous Integration (CI) Pipeline knowledge (GitLab

CI, Github Actions).

? Experience in public cloud hosting services (AWS, Google Cloud, Azure) as

well as familiarity with container computing (eg. Docker, ECS, Kubernetes).

? Proficiency in Infrastructure as Code (Terraform, CloudFormation, Bicep,

Pilumi).

Experience with Source Control Management systems (Git).

? Recognized as a subject matter expert on at least one OS and proficient in

multiple operating systems, including OS performance monitoring, setup,

configuration, tuning, and troubleshooting.

? Proficient in web or web server technologies: Java, Node.js, Tomcat, IIS,

Apache/nginx, MySQL, PostgreSQL, etc., including being able to perform basic

setup, configuration, and troubleshooting.

? Understand internet technologies and network protocols, including HTTP,

basic load balancing configurations, security zones, VIPs, SNMP, REST and

DNS.

? Able to implement existing base standards for new systems and/or applications

with mentoring for all of the following:

o Site monitoring and instrumentation

o Application monitoring and instrumentation

o System monitoring and instrumentation

o Resiliency and performance

? Able to diagnose simple to complex system problems.

? Able to author tools and scripts to be used by others to automate repeatable

production tasks in standard languages like Bash, Ruby, Python, or Go.

? Advanced skills in at least one programming language such as Python, PHP,

Ruby, Java, Go, Swift or C++ and able to build unit test suites for all software

being developed.

? Experience supporting and/or developing backend tools or services

? Able to perform and provide in depth analysis on load test runs against a

moderately complex system.

? Demonstrates exceptional troubleshooting methodology, including the ability to

author and instruct new methodologies to the SRE team.

? Independently resolve moderately to highly complex system and application

incidents.

? Able to identify and propose system and application fixes for performance

bottlenecks.

? Able to evaluate new application requirements for capacity and run-time best

practices.

? Able to evaluate new system and/or infrastructure solutions for technical

feasibility against known requirements and standards.

? Effective at dealing with change: Able to transition in role or handle a

significant modification to workflow or technology with minimal ramp-up time

and with very little guidance.

Communication and Leadership Requirements

? Excellent verbal and written communication to all levels in the

organization.

? Serves as the primary point of contact with Manager.

? Demonstrates curiosity and continuous learning and self-

improvement.

Preferred Qualifications Masters of Science degree in computer science or related field or equivalent experience in technical

operations and software engineering

Required Education BS in Computer Science or related field with 7+ years o

Additional Information Bachelor of Science degree in computer science or related field or equivalent experience in technical operations and software engineering.

Similar Jobs

More Information Technology Jobs

Find similar Site Reliability Eng jobs: