Full Job Description
We are looking for a skilled Apache Druid Administrator to join our Data Infrastructure team. In this role, you will be responsible for designing, deploying, and managing large-scale Apache Druid clusters that power petabyte-scale analytical workloads. You will work closely with platform architects and business teams to ensure high-availability, performance, and reliability of our real-time analytics platform.
Responsibilities
Design, deploy, administer, and optimize large-scale Apache Druid clusters supporting PB-scale datasets.
Manage and troubleshoot Druid services.
Perform cluster upgrades, patching, capacity planning, and platform modernization activities.
Monitor cluster health, ingestion performance, query latency, segment distribution, and resource utilization.
Troubleshoot ingestion failures, stuck tasks, compaction issues, retention policies, and segment management problems.
Optimize Druid queries, partitioning strategies, indexing specifications, and compaction configurations.
Configure and maintain Druid metadata stores using MySQL.
Develop operational automation using Ansible and Infrastructure-as-Code methodologies.
Create dashboards, alerts, and monitoring solutions using Grafana, Prometheus, and logging platforms.
Work with platform teams to ensure high availability, disaster recovery, backup, and security compliance.
Participate in production support, incident response, root cause analysis, and performance tuning initiatives.
Collaborate with architects and business teams to understand analytical requirements and translate them into scalable solutions.
Requirements
Bachelor's degree in Computer Science, Engineering, or a related field; OR equivalent combination of education and relevant experience.
Hands-on 7+ years of experience administering Apache Druid in production environments at scale
Strong understanding of Druid architecture: Coordinator, Overlord, Broker, Historical, MiddleManager, Router.
Proficiency in Druid ingestion specifications (native batch, Kafka, Hadoop-based)
Experience with compaction, retention policies, and segment lifecycle management
Expertise in query optimisation and partitioning strategies for large datasets