System Reliability Analyst

Medix

$85K — $110K *
Information Technology
Less than 5 years of experience
Job Overview by Ladders

Qualifications

  • Bachelor's degree in Computer Science or related technical field.
  • 3-5+ years of hands-on experience in production systems support or application development within Unix/Linux environments.
  • Proficient in scripting languages such as Python, bash, Perl, or Ruby for automation tasks.
  • In-depth knowledge of System/Site Reliability Engineering (SRE) principles and practices.
  • Experience with enterprise monitoring tools like AppDynamics, Grafana, Splunk, and Dynatrace, particularly for designing SLO/SLI dashboards.
  • Familiarity with automation/configuration/release management tools such as Puppet, Ansible, or Chef.
  • Solid understanding of modern software architectures and distributed systems, including load balancing and microservices.

Responsibilities

  • Collaborate with engineering teams to design, build, and optimize system performance.
  • Diagnose and troubleshoot issues across hardware, software, applications, and networks.
  • Identify and mitigate operational risks and toil through effective solutions.
  • Enhance observability across infrastructure and applications.
  • Proactively manage risks to system reliability.
  • Influence application design and operational readiness with a focus on reliability.

Benefits

  • Flexible work hours and remote work options.
  • Opportunities for professional development and continuous learning.
  • Collaborative and innovative work culture.
  • Access to state-of-the-art tools and technologies.
  • Chance to work on high-impact projects that enhance system reliability.
Full Job Description
Responsibilities :
  • Working closely with engineering/development teams to design, build, optimize, and maintain systems.
  • Troubleshooting issues across the entire technology stack: hardware, software, application, and network.
  • Aggressively targeting toil and operational risk, and deploying solutions to reduce these.
  • Broadening infrastructure and application observability.
  • Proactively identifying and addressing active or potential risks to system reliability.
  • Advocating for reliability priorities in application design reviews and operational readiness exercises for new and existing services.

RequiredSkills :
  • Bachelor's degree in Computer Science or other technical discipline(s)
  • 3-5+ years practical experience in production systems support or application development- Hands on experience managing systems in a large scale distributed Unix/Linux environment is essential.
  • Automation-related experience is required, using scripting languages such as Python, bash, Perl, and/or Ruby.
  • Deep knowledge of and hands-on experience applying the principles of System/Site Reliability Engineering (SRE).
  • Practical experience designing and instrumenting SLO/SLI dashboards is particularly valuable.
  • Hands on experience on enterprise tools such as AppDynamics, Grafana, Splunk, Dynatrace
  • Experience with Puppet, Ansible, Chef, GitHub or any automation/configuration/release management tools- Awareness of, and ability to reason through modern software and systems architectures, including load balancing, databases, queueing, caching, distributed systems failure modes, micro services, Cloud, etc.
  • Working ability to interact with message transport platforms and protocols (MQ, CPS, XML, FIX) and distributed database technologies (DB2, Sybase, Mongo, GreenPlum, Postgres, KDB).
  • Autosys scheduling and batch processing concepts.
  • Deep understanding of infrastructure and operating system concepts such as processes, memory allocation, and networking, with an understanding of how applications are affected by the above, and ability to debug and troubleshoot accordingly.

Preferred Skills :
  • Higher-level compiled languages such as C++, C#, JAVA, Scala, and Go are a big plus

Required Skills :
  • Bachelor's degree in Computer Science or other technical discipline(s)
  • 3-5+ years practical experience in production systems support or application development- Hands on experience managing systems in a large scale distributed Unix/Linux environment is essential.
  • Automation-related experience is required, using scripting languages such as Python, bash, Perl, and/or Ruby.
  • Deep knowledge of and hands-on experience applying the principles of System/Site Reliability Engineering (SRE).
  • Practical experience designing and instrumenting SLO/SLI dashboards is particularly valuable.
  • Hands on experience on enterprise tools such as AppDynamics, Grafana, Splunk, Dynatrace
  • Experience with Puppet, Ansible, Chef, GitHub or any automation/configuration/release management tools- Awareness of, and ability to reason through modern software and systems architectures, including load balancing, databases, queueing, caching, distributed systems failure modes, micro services, Cloud, etc.
  • Working ability to interact with message transport platforms and protocols (MQ, CPS, XML, FIX) and distributed database technologies (DB2, Sybase, Mongo, GreenPlum, Postgres, KDB).
  • Autosys scheduling and batch processing concepts.
  • Deep understanding of infrastructure and operating system concepts such as processes, memory allocation, and networking, with an understanding of how applications are affected by the above, and ability to debug and troubleshoot accordingly.

Similar Jobs

More Jobs at Medix

  • Full stack Engineer
    $100K — $150K *
    New York, NY 10025 (New York County)
    Information Technology
    In-Person
  • Test Engineer
    $70K — $95K *
    Vienna, VA 22182 (Fairfax County)
    Information Technology
    In-Person
  • Finance Analyst 2
    $70K — $95K *
    San Francisco, CA 94112 (San Francisco County)
    Finance & Insurance
    In-Person
  • Network Architect
    $100K — $130K *
    Chicago, IL 60629 (Cook County)
    Energy & Utilities
    In-Person
  • Staff Engineer
    $90K — $120K *
    Portage, MI 49024 (Kalamazoo County)
    Aerospace & Defense
    In-Person

More Information Technology Jobs

Find similar System Reliability Analyst jobs: