Role description
Mandatory skills
Airflow advanced DAG design dynamic task mapping deferrable operators SLAs backfills crossDAG dependencies Kubernetes Helm autoscaling podnode tuning network policies PDBs canarybluegreen deployments Python operatorshooks development Terraform GitOps Argo CDFlux CICD Observability Prometheus Grafana logging ing Cloud Data Platforms AWSAzureGCP SnowflakeBigQueryRedshift SparkDatabricks EMRDataproc Security IAM RBAC OIDCSSO VaultSecrets Manager SRE Incident Management Bash scripting
5 to 8 years building operating data or platform systems 3 years running Airflow in production at scale hundredsthousands of DAGS and high task throughput
Deep Airflow expertise DAG design and testing idempotency deferrable operatorssensors dynamic task mapping task groups datasets poolsqueues SLAs retriesbackfills crossDAG dependencies
Strong Kubernetes experience running Airflow and supporting services Helm autoscaling nodepod tuning topology spread network policies PDBs and bluegreen or canary strategies Automationfirst mindset Terraform Helm GitOps Argo CDFlux and CICD for platform lifecycle policyascode OPAGatekeeperConftest for DAG connection and secrets changesProficiency in Python for authoring operatorshooksutilities solid Bash familiarity with Go or Java is a plus
Observability and SRE practices PrometheusGrafanaStatsD centralized logging design capacitythroughput modeling performance tuning
Data platform experience with at least one major cloud AWSAzureGCP and systems like SnowflakeBigQueryRedshift DatabricksSpark EMRDataproc strong grasp of IAM VPC networking and storage S3GCSADLS
Securitycompliance SSOOIDC RBAC secrets management VaultSecrets Manager
auditing leastprivilege connection management and change control
Proven incident leadership runbook creation and platform roadmap execution excellent crossfunctional communication