GCP Kubernetes SRE

Prophecy Technologies

• $120K — $150K *

Scottsdale, AZ 85254In-Person

Information Technology

8 - 10 years of experience

1 month ago

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

10+ years in Site Reliability Engineering or related field
Google Cloud Architect Certification preferred
Certified Kubernetes Administrator (CKA) preferred
Proficient in Python, Ansible, and Node.js
Strong experience with incident management and production triage
Hands-on experience with automation and CI/CD pipelines
Deep understanding of AI/ML concepts and AIOps practices

Responsibilities

Manage incidents and provide on-call support for system stability
Develop and maintain automation scripts and CI/CD pipelines
Implement and manage infrastructure using Terraform, Helm, and GitHub Actions
Monitor system performance using Prometheus and Grafana
Apply AI/ML and AIOps practices to enhance operational efficiency
Support ML/AI platforms and integrate AI-driven automation into incident response

Benefits

Flexible work environment
Opportunities for professional growth and certification
Collaboration with a talented engineering team
Access to cutting-edge technologies and tools
Support for continuing education and learning initiatives

Full Job Description

Role Overview:

This role is for a highly skilled Site Reliability Engineer with strong expertise in Kubernetes and Google Cloud Platform (GCP), specifically GKE. The position requires a deep understanding of infrastructure as code (IaC) using Terraform, Helm, and GitHub Actions, alongside proficiency in Python, Ansible, and Node.js. The engineer will be crucial in maintaining and enhancing observability stacks with Prometheus and Grafana, ensuring robust Linux systems and networking fundamentals, and contributing to automation and CI/CD pipelines. A significant aspect of the role involves applying AI/ML concepts and AIOps practices to improve system reliability and incident management.

Key Responsibilities:

Manage incidents, provide on-call support, and perform production triage to ensure system stability.
Develop and maintain automation scripts and CI/CD pipelines for efficient software delivery and infrastructure management.
Implement and manage infrastructure using IaC principles with Terraform, Helm, and GitHub Actions.
Monitor system performance and health using Prometheus and Grafana observability tools.
Apply AI/ML concepts and AIOps practices, including model lifecycle management, monitoring, and AI-driven alerting, to enhance operational efficiency.
Support and operate ML/AI platforms or pipelines (MLOps) and integrate AI-driven automation into monitoring and incident response.

Required Skills:

Strong experience with Kubernetes and GCP (GKE).
Strong experience in IaC (Terraform), Helm, and GitHub Actions.
Proficiency in Python, Ansible, Node.js.
Strong experience with Prometheus and Grafana observability stack.
Solid understanding of Linux systems and networking fundamentals.
Experience in incident management, on-call support, and production triage.
Hands-on experience with automation and CI/CD pipelines.
Strong understanding of AI/ML concepts and AIOps practices (model lifecycle, monitoring, or AI-driven alerting).

Qualifications:

10+ years of experience in Site Reliability Engineering or a related field.
Google Cloud Architect Certification (Preferred).
Certified Kubernetes Administrator (CKA) (Preferred).

Preferred Skills:

Experience in Java/J2EE, Spring Boot.
Experience supporting or operating ML/AI platforms or pipelines (MLOps).
Exposure to AIOps tools, anomaly detection, or predictive analytics systems.
Experience with large-scale distributed systems and microservices architecture.
Experience with GPU-based workloads or ML infrastructure on GCP.
Knowledge of Kubeflow, Vertex AI, or ML pipelines.
Experience integrating AI-driven automation into monitoring and incident response.

* Ladders Estimates

Similar Jobs

Senior Systems Engineer
$146K — $234K *
Peraton
Remote
Today
Senior Site Reliability Engineer
$120K — $150K *
Ellucian
Remote
Reposted Today
Senior Systems Operations Engineer
$100K — $130K *
Wells Fargo
Chandler, AZ 85225 (Maricopa County)
Today
Advanced Space Systems Engineer (Sign-on Bonus Available)
$118K — $131K *
General Dynamics
Scottsdale, AZ 85254 (Maricopa County)
Yesterday
Mission Planning SME
$147K — $158K *
Boeing
Nellis Afb, NV 89191 (Clark County)
Reposted Yesterday
Senior Systems Engineer
$82K — $172K *
CACI International
Remote
Yesterday

Get Ready For Your
Next Interview

More Jobs at Prophecy Technologies

Senior .NET Full Stack Developer
$100K — $130K *
Clifton, NJ 07011 (Passaic County)
2 days ago
Information Technology
In-Person
AWS AI Platform Engineer
$100K — $140K *
Raleigh, NC 27610 (Wake County)
2 days ago
Enterprise Technology
In-Person
Databricks Data Engineer
$120K — $150K *
Quincy, MA 02169 (Norfolk County)
2 days ago
Information Technology
In-Person
Oracle EBS Incident Management Specialist
$70K — $95K *
Raleigh, NC 27610 (Wake County)
2 days ago
Information Technology
In-Person
Oracle EBS Incident Management Specialist
$90K — $120K *
San Jose, CA 95123 (Santa Clara County)
2 days ago
Information Technology
In-Person

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
6 days ago
Virtualization/Storage Engineer
$80K — $110K *
Bowhead Support Services
Dahlgren, VA 22448 (King George County)
Today
Senior Software Engineer (Data Engineering)
$100K — $130K *
Charles Schwab
Indianapolis, IN 46227 (Marion County)
Today
Senior .NET Developer
$100K — $130K *
Charles Schwab
Southlake, TX 76092 (Tarrant County)
Reposted Today
Principal Architect, Site Reliability Engineering
$150K — $180K *
Charles Schwab
Southlake, TX 76092 (Tarrant County)
Reposted Today

Find similar GCP Kubernetes SRE jobs:

Nationwide Scottsdale, AZ

GCP Kubernetes SRE

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar GCP Kubernetes SRE jobs:

Get Ready For Your
Next Interview