Royal Caribbean Group

Senior Engineer, Site Reliability

Royal Caribbean Group$100K — $130K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 6-9+ years in Observability, SRE, or Platform Engineering with large-scale environments.
  • Hands-on experience with Cisco AppDynamics, including APM configuration and diagnostics.
  • Proficiency in Splunk for query development and log design.
  • Experience with network monitoring using Cisco ThousandEyes.
  • Strong skills in PagerDuty AIOps, including alert grouping and event orchestration.
  • Familiarity with OpenTelemetry for telemetry strategy and instrumentation.
  • Hands-on Kubernetes experience (EKS/AKS) for container observability.

Responsibilities

  • Own and evolve an enterprise observability platform across major technologies.
  • Design and implement a unified telemetry strategy using OpenTelemetry.
  • Govern telemetry data pipelines for optimal data quality and cost.
  • Drive observability coverage for both ship and shore environments.
  • Define Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for critical services.
  • Implement noise-minimizing alert frameworks integrated with on-call workflows.
  • Lead the definition of observability standards and mentor junior engineers.

Benefits

  • Innovative work environment within a leading maritime and hospitality technology setting.
  • Growth opportunities through mentorship and defined career paths.
  • Collaboration with diverse teams across multiple brands.
  • Access to cutting-edge technology and tools for observability.
  • Potential for involvement in high-impact projects affecting thousands of users.
Full Job Description
The Royal Caribbean Group's Site Reliability Team has an exciting career opportunity for a full time Senior Engineer, Site Reliability reporting to the Senior Manager.

This position is onsite and based in Miramar, Florida.

Tis position is also not eligible for work authorization sponsorship.

Position Summary:

We are seeking a highly skilled Senior Site Reliability Engineer to own, operate, and continuously mature our enterprise observability platform across one of the most complex hospitality and maritime technology environments in the world. This role is the engineering backbone of RCG's observability practice - responsible for ensuring deep, reliable system visibility across 950+ applications serving 100,000+ users across Royal Caribbean International, Celebrity Cruises, and Silversea.

You will operate at the intersection of infrastructure, application performance, network intelligence, and AIOps - driving measurable improvements in mean-time-to-detect (MTTD), mean-time-to-resolve (MTTR), and overall service reliability. This is a platform engineering and standards leadership role, not a tool administration position.

Key Responsibilities:

Platform Ownership & Architecture
  • Own and evolve the enterprise observability platform spanning Cisco AppDynamics, Splunk, ThousandEyes, and PagerDuty AIOps across AWS and Azure environments.
  • Architect and enforce a unified telemetry strategy - metrics, logs, traces, and events - standardized via OpenTelemetry across all application tiers.
  • Design and govern telemetry data pipelines including ingestion, filtering, routing, and retention to optimize signal quality and platform cost at enterprise scale.
  • Drive full-stack observability coverage across ship and shore environments, including maritime network paths, contact center platforms, and revenue-critical booking systems.


SLIs, SLOs & Reliability Engineering
  • Define and implement Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets for all critical services across RCG's three brands.
  • Build alerting frameworks that minimize noise, surface actionable signals, and integrate cleanly with PagerDuty AIOps on-call workflows.
  • Partner with SRE teams to drive MTTR reduction, post-incident observability improvements, and proactive reliability practices.
  • Instrument and publish DORA metrics (Deployment Frequency, Lead Time, Change Failure Rate, MTTR) to support engineering productivity and release confidence.


AIOps & Intelligent Detection
  • Drive AI-assisted incident detection, anomaly correlation, and root cause analysis using PagerDuty AIOps and Splunk IT Service Intelligence (ITSI).
  • Tune and mature ML-based alert grouping and noise suppression models to reduce alert fatigue and accelerate triage.
  • Integrate observability signals with ServiceNow ITSM for automated incident creation, enrichment, and closed-loop resolution workflows.


Kubernetes & Cloud-Native Observability
  • Enable and govern Kubernetes observability for EKS and AKS workloads - container health, resource utilization, pod-level tracing, and cluster performance.
  • Integrate observability instrumentation into CI/CD pipelines (GitHub Actions) to enable deployment-correlated performance analysis.
  • Maintain and extend AWS CloudWatch and Azure Monitor integrations to ensure cloud infrastructure is fully represented in the observability estate.


Standards, Enablement & Technical Leadership
  • Define observability standards, instrumentation best practices, and onboarding frameworks for product and platform engineering teams.
  • Mentor junior engineers and serve as the technical authority for observability discipline across SRE and Platform Engineering.
  • Lead post-incident reviews (PIRs) and translate findings into observability platform improvements.
  • Govern observability cost optimization: telemetry volume management, retention tiering, and platform licensing efficiency.


Required Qualifications
  • 6-9+ years in Observability, SRE, or Platform Engineering in enterprise-scale environments.
  • Deep hands-on expertise with Cisco AppDynamics - APM configuration, business transaction mapping, code-level diagnostics, and baseline management.
  • Strong proficiency with Splunk - SPL query development, ITSI service health trees, KPI configuration, alert policy management, and log pipeline design.
  • Experience with Cisco ThousandEyes for network path monitoring, ISP/WAN intelligence, and BGP-level visibility.
  • Proficiency with PagerDuty AIOps - intelligent alert grouping, noise suppression, event orchestration, and on-call workflow design.
  • Strong command of OpenTelemetry - collector configuration, SDK instrumentation, semantic conventions, and multi-backend exporting.
  • Hands-on Kubernetes experience (EKS/AKS) - container observability, resource metrics, and pod-level distributed tracing.
  • Experience with AWS CloudWatch and/or Azure Monitor for cloud infrastructure observability.
  • Scripting and automation proficiency: Python, Bash, Terraform, and/or Ansible for observability tooling deployment and configuration.
  • Experience defining SLIs/SLOs, error budgets, and actionable alerting strategies tied to business service reliability.
  • ServiceNow ITSM integration experience - event management, incident auto-creation, and CMDB-enriched alerting.
  • Experience with CI/CD observability integration (GitHub Actions or equivalent).


Preferred Qualifications
  • Experience with Prometheus, Grafana, Loki, or Tempo for supplemental or hybrid observability architectures.
  • Familiarity with eBPF-based observability tooling (e.g., Pixie, Cilium) for deep kernel-level and network-layer visibility.
  • Experience with synthetic monitoring and real user monitoring (RUM) to capture end-user experience across digital channels.
  • Familiarity with Cribl or equivalent telemetry pipeline tooling for data routing, enrichment, and cost governance.
  • Exposure to DORA metrics instrumentation and developer experience observability frameworks.
  • Experience in large-scale hospitality, travel, maritime, or consumer digital platforms.
  • Certifications: Cisco AppDynamics Certified Associate, Splunk Core Certified Power User, AWS Solutions Architect, Kubernetes (CKA/CKAD), or OpenTelemetry Certified Associate (OTCA/CNCF).


Agency and Third-Party Submissions: Please note this is a direct search by the Company, and applications through agencies and other third parties will not be accepted, nor will fees be paid for unsolicited resumes. Any unsolicited resumes will be considered the Company's property.

We know there's a lot to consider. As you go through the application process, our recruiters will be glad to provide guidance, and more relevant details to answer any additional questions. Thank you again for your interest in Royal Caribbean Group. We'll hope to see you onboard soon!

About Royal Caribbean Group

Royal Caribbean Group is a cruise vacation company with a global fleet of 63 ships traveling around the world. The company provides celebrity cruises and silversea cruises. Royal Caribbean Group was established in 1968 in Miami, Florida.

Royal Caribbean Group Careers

There has never been a more exciting time to explore job opportunities with Royal Caribbean Group, a leader in the global cruise industry known for innovation and excellence.

Work You’ll Do

Join Royal Caribbean Group's dynamic team to help redefine the travel experience for millions of guests worldwide. The company's commitment to growth and leadership in the cruise industry offers a unique platform for professionals to advance their careers. Transform the future of travel with Royal Caribbean Group, where diversity, innovation, and a passion for service converge to create extraordinary vacation experiences. Lead in a market where skills in technology, customer service, and operational excellence are prized. Royal Caribbean Group stands at the forefront of the travel industry, offering team members unparalleled opportunities for career advancement. Work alongside a global team of professionals dedicated to pioneering new paths in the cruise sector. Royal Caribbean Group fosters a culture of innovation and leadership, making it an ideal workplace for those aiming to make a significant impact.

Royal Caribbean Group Professional Pathways

The team is actively building a robust professional network, inviting individuals to master their career journey in the vibrant world of cruise travel.

Do Innovative Work

Engage with a diverse team at Royal Caribbean Group—professionals dedicated to reshaping the future of travel through continuous innovation and a deep understanding of the global travel market.

Drive Innovation and Leadership

Deliver targeted solutions and exceptional guest experiences by leveraging deep industry knowledge and a commitment to innovation that’s second to none.

Be Part of a Great Team

Join a workforce that thrives on collaboration and diversity. Royal Caribbean Group offers a variety of job opportunities that harness the capabilities of its expansive global network.

Future-proof Your Career

Royal Caribbean Group provides a wealth of opportunities for personal and professional development, supported by comprehensive training programs and a commitment to promoting from within.

Explore

Discover how Royal Caribbean Group is leading the way in employee satisfaction and guest service, setting new standards in the cruise industry.

The Royal Caribbean Group Advantage

With a focus on diversity, leadership, and professional growth, Royal Caribbean Group helps team members navigate their careers in an ever-evolving industry. The company's global scale and commitment to innovation offer unmatched opportunities for career advancement.

Stay Connected

Join the Team

Search open positions that match your skills and interests. Royal Caribbean Group looks for passionate, curious, creative, and solution-driven team players. SEARCH ROYAL CARIBBEAN JOBS

Keep Up to Date

Stay ahead with career tips, insider perspectives, and industry-leading insights you can put to use today—all from the professionals who work at Royal Caribbean Group.

READ CAREERS BLOG

Job Alert Emails

Personalize your subscription to receive job alerts, latest news, and insider tips tailored to your preferences. Explore the exciting and rewarding opportunities that await at Royal Caribbean Group.
Learn more about Royal Caribbean Group
Size
10,001 employees
Industry

Similar Jobs

More Jobs at Royal Caribbean Group

More Information Technology Jobs

Find similar Senior Engineer, Site Reliability jobs: