Define and implement platform-wide observability and telemetry standards for data workloads.
Enable consistent metrics, logging, alerting, and health signals across platforms and pipelines.
Ensure operational visibility is actionable and supports automated response.
Self-Service Operational Tooling
Build self-service operational tooling that enables engineering teams to debug, monitor, and resolve issues independently.
Reduce operational dependencies on centralized teams and vendors.
Empower teams with standardized dashboards, insights, and tooling.
Automated Cost Governance & Optimization
Establish automated cost governance and optimization frameworks for data platforms.
Improve cost visibility through telemetry, alerts, and policy-based controls.
Drive proactive cost management rather than reactive spend reviews.
Required Skills & Experience
Strong experience in Data Operations, Data Platform Operations, or SRE/DataOps roles.
Proven experience reducing operational toil through automation-first approaches.
Hands-on experience with incident management, observability, and operational tooling.
Experience operating within or transforming managed services or vendor-led operating models.
Strong understanding of cost governance and optimization in data platforms.
Hands-on experience with Microsoft Azure data platform services such as Azure Data Factory, Azure Databricks, Azure Synapse Analytics, Microsoft Fabric
Experience implementing CI/CD and release automation using Azure DevOps, including Azure Pipelines, repository management, deployment workflows, and environment-based promotion controls.
Experience with Microsoft Entra ID, Azure Key Vault, and Microsoft Purview for identity, secrets management, governance, data lineage, and compliance controls.
Strong scripting and engineering skills in Python, SQL, and PowerShell to build automation, operational runbooks, telemetry pipelines, and remediation workflows.