SRE/DevOps Engineer

Versana

$120K — $150K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 5+ years as a Site Reliability Engineer or similar role
  • 3+ years of experience with public cloud (Azure, AWS, GCP)
  • 3+ years in observability tools (Datadog, Elasticsearch, Grafana)
  • 3+ years in containerization and orchestration (Docker, Kubernetes)
  • 2+ years developing and managing CI/CD pipelines
  • 2+ years with Infrastructure-as-Code tools (Terraform, Azure Bicep)
  • 1+ year with site reliability tools (Gremlin, Chaos Mesh)

Responsibilities

  • Design, implement, and enhance system observability and monitoring tools
  • Monitor system performance and create incident response plans
  • Implement service-level objectives (SLOs) and indicators
  • Improve system reliability and resiliency
  • Conduct post-incident reviews and implement changes
  • Assist teams in implementing observability tools
  • Leverage observability for key incident management metrics
  • Optimize systems and workflows through architecture and automation
  • Collaborate with developers to ensure DevOps best practices

Benefits

  • Flexible working hours
  • Opportunity for remote work
  • Professional development opportunities
  • Support for certifications in cloud technologies
  • Participation in innovative engineering projects
Full Job Description
About You:
Versana is seeking a motivated SRE/DevOps Engineer with strong observability experience to join
our growing Platform Engineering squad. The squad's goal is to manage public cloud, improve
DevOps practices, and monitor Versana's real-time syndicated loan data platform. The ideal
candidate will have a deep understanding of cloud-native applications, distributed computing,
CI/CD implementation, observability tools and practices.

Key Responsibilities:
• Design, implement and enhance system observability and monitoring tools
• Monitor system performance, create incident response plans, and implement observability
practices to gain insights into system behavior.
• Implement and monitor service-level objectives (SLOs) and indicators.
• Improve system reliability and resiliency.
• Conduct post-incident reviews and implement necessary changes to prevent system
failures.
• Assist teams in implementing observability tools and leveraging available telemetry data to
troubleshoot and resolve incidents and problems.
• Leverage observability and event management to improve key incident management
metrics, such as mean time to detect and mean time to restore services.
• Continually optimize systems and workflows by improving architecture, infrastructure,
automation, CI/CD, and observability.
• Collaborate with developers to ensure applications are designed with DevOps best
practices in mind.
• Participate in a rotating on-call schedule for weekend releases and being available to
respond to production issues outside of regular working hours, including weekends and
holidays.

Must Have:
• 5+ years of experience as a Site Reliability Engineer or similar role.
• 3+ years of work experience with public cloud (Azure, AWS or GCP).
• 3+ years of direct experience with observability tools like Datadog, Elasticsearch, and
Grafana Labs, etc.
• 3+ years of experience with containerization and orchestration technologies like Docker
and Kubernetes.
• 2+ years of experience in development and management of CI/CD pipelines (e.g., Azure
DevOps, Gitlab CI/CD, Github Actions, Jenkins, etc).
• 2+ years of experience with Infrastructure-as-code tools like Terraform, Azure Bicep, Cloud
Formation, etc.
• 1+ years of experience with site reliability tools like Gremlin, Chaos Mesh, or similar.
• Proven track record leveraging core observability concepts, end-user monitoring, and
infrastructure monitoring with SaaS solutions.
• Experience with messaging services like Kafka or Azure Event Hubs.
• Good understanding of the Linux operating system.

Nice to Have:
• Experience in at least one coding language such as Java, JavaScript, Python, GoLang, or .NET.
• Certifications in cloud technologies.
• Experience with Azure cloud or Azure DevOps.
• Experience with Datadog or similar modern observability tools.

Similar Jobs

More Jobs at Versana

More Information Technology Jobs

Find similar SRE/DevOps Engineer jobs: