SRE / Hadoop Admin

Prophecy Technologies

$120K — $160K *
Enterprise Technology
8 - 10 years of experience
Job Overview by Ladders

Qualifications

  • 10+ years in Platform Engineering, Site Reliability Engineering, or similar roles with proven experience in managing large-scale Hadoop infrastructure.
  • Deep expertise in the Hadoop ecosystem, especially HDFS, YARN, Hive, Spark, NiFi, Ambari, and Iceberg.
  • Strong Linux system administration skills, ideally in CentOS or Rocky, focusing on tuning and troubleshooting.
  • Proficient in containerization and orchestration using Docker and Kubernetes.
  • Experience with automation and Infrastructure as Code, specifically using GitLab CI/CD and scripting in Python and bash.
  • Knowledge of monitoring tools like Prometheus and Grafana, with an emphasis on system health and alerting.
  • Practical understanding of networking concepts, security protocols, and data compliance regulations.

Responsibilities

  • Own and operate the end-to-end infrastructure of a large-scale, on-prem Hadoop-based data platform.
  • Design, implement, and maintain core platform components including Hadoop, Hive, Spark, and others.
  • Automate infrastructure management, monitoring, and deployments using CI/CD pipelines.
  • Implement and enforce security controls, access management, and compliance standards.
  • Perform upgrades, patching, performance tuning, and troubleshoot platform components.

Benefits

  • Mentorship opportunities to lead and guide engineering teams.
  • Exposure to working with cutting-edge technology in a petabyte-scale environment.
  • Hands-on experience with automation and cloud-native toolsets.
  • Collaborative work culture fostering innovation and best practices.
Full Job Description
Purpose:

Seeking a highly experienced Senior or Lead Platform Engineer/Site Reliability Engineer (SRE)/Hadoop Admin to manage and enhance our petabyte-scale, on-premises data platform. This platform is built using the open-source Hadoop ecosystem. The ideal candidate possesses in-depth technical expertise, a solid understanding of distributed systems, and extensive experience in operating and optimizing large-scale data infrastructures. This role requires a hands-on technical leader who can drive platform innovation, ensure high availability and reliability, and mentor team members in best practices for performance, automation, and resiliency.

Hadoop Admin/ SRE || Fountain Valley, CA

Essential Functions:
• Own and operate the end-to-end infrastructure of a large-scale, on-prem Hadoop-based data platform, ensuring high availability and reliability.
• Design, implement, and maintain core platform components, including Hadoop, Hive, Spark, NiFi, Iceberg, ELK, OpenSearch and Ambari.
• Automate infrastructure management, monitoring, and deployments using CI/CD pipelines (GitLab) and scripting.
• Implement and enforce security controls, access management, and compliance standards.
• Perform system upgrades, patching, performance tuning, and troubleshooting across platform components

Basic Requirements:
• 10+ years of experience in Platform Engineering, Site Reliability Engineering, or similar roles, with proven success managing large-scale, distributed Hadoop infrastructure.
• Deep expertise in the Hadoop ecosystem, including HDFS, YARN, Hive, Spark, NiFi, Ambari, and Iceberg.
• Strong Linux system administration skills (CentOS/Rocky preferred), including system tuning, performance optimization, and troubleshooting.
• Proficiency in containerization and orchestration using Docker and Kubernetes.
• Solid experience with automation and Infrastructure as Code, leveraging tools like GitLab CI/CD and scripting in Python and bash.
• Practical knowledge of monitoring and observability tools (e.g., Prometheus, Grafana, OpenTelemetry) and understanding of system health, alerting, and telemetry.
• Familiarity with networking concepts, security protocols, and data compliance requirements.
• Experience managing petabyte-scale data platforms and implementing disaster recovery strategies.
• Understanding of data governance, metadata management, and operational best practices.
• Demonstrated ability to lead technical projects, mentor engineers, and collaborate effectively with cross-functional teams.
• Excellent problem-solving, communication, and leadership skills.

Similar Jobs

More Jobs at Prophecy Technologies

More Enterprise Technology Jobs

Find similar SRE / Hadoop Admin jobs: