Big Data Engineer

Teranet, Inc.

$110K — $130K *
Information Technology
5 - 7 years of experience
Job Overview by Ladders

Qualifications

  • 5+ years of experience with modern data platforms and medallion architecture
  • Expertise in Spark, PySpark, SQL, and Bash
  • Extensive experience developing ETLs and processing large datasets
  • In-depth knowledge of data pipelines using Airflow and HVR/Debezium
  • Strong coding skills in Python and Shell with best practices
  • Familiarity with AWS services, particularly S3 and EKS
  • Excellent written and verbal communication skills

Responsibilities

  • Design and orchestrate data pipelines using Airflow
  • Integrate diverse data sources with Python and Debezium
  • Transform and optimize large data volumes for performance
  • Collaborate with teams to enhance data delivery and models
  • Ensure data quality and accuracy during ingestion
  • Establish data security and governance controls
  • Create documentation for best practices and processes

Benefits

  • 100% Employer Paid Health Benefit Plan
  • Employer Matching Retirement Savings Plan
  • Paid Vacation, Floater Days & Sick Leave
  • Maternity, Parental, and/or Adoption Leave Top-Up Programs
  • Corporate Discounts & GoodLife Group Rate Membership
  • Employee Assistance Program for you and your loved ones
Full Job Description

About the Role

Teranet is seeking an experienced Data Engineer for its Enterprise Data Analytics team within the IT line of business. This is a pure data engineering role focused on processing large-scale data and optimizing Spark performance, not a Data Analyst or BI Developer position.

In this role, you will:

  • Design, build, and orchestrate data pipelines using tools like Airflow.

  • Integrate data from diverse sources such as RDBMS systems (via Debezium) and files (CSV, JSON, XML, text) using Python.

  • Be responsible for transforming and processing large volumes of data, optimizing Spark execution plans for performance and cost efficiency and writing complex SQL and Python code.

  • Work closely with product owners, data analysts, and other data engineers to develop and enhance data pipelines and data models that support data delivery, AI development, and BI insights.

  • Configure data ingestion frameworks for databases like Oracle, MS SQL Server, and PostgreSQL, while ensuring data accuracy, quality, and proper curation to meet various business use cases.

  • Partner with infrastructure teams to enforce data security, access controls, and auditability, ensuring that data is properly governed, secure, and compliant across the organization.

What You’ll Be Doing

  • Participate in planning with business product owners, data analysts and identify tasks for the data analytics team.

  • Design and develop data pipelines (Debezium and HVR), curate data for enterprise-wide usage, prepare data models for specific use case.

  • Develop test objectives, test plan and success criteria (connectivity, data replication, auto fail-over, peak load performance etc.).

  • Work with infrastructure, security, and networking teams to ensure connectivity requirements are met for data pipelines sources and targets.

  • Tuning of data ingestion and replication to meet performance targets.

  • Configure the CDC framework as required to create daily/weekly/monthly data snapshots within acceptable performance targets.

  • Design and implement technology best practices, guidelines, and repeatable processes. Create design, test plan, and confluence documentation.

  • Able to self-direct, prioritize and perform assignments with minimal supervision

About You

  • 5+ years of experience with modern data platform, medallion architecture, configuring data pipelines, ETL, RDBMS, SQL, Spark, PySpark, Bash, Linux.

  • Familiarity with AWS S3, Hive metastore, Trino, Airflow, Control-M.

  • Knowledge of CDC based data ingestion setup, preferably using HVR and Debezium.

  • Deep Hive/Spark/SQL knowledge, development, and testing experience

  • Expert Python/Spark/Shell development and coding best practices skills

  • Excellent day-to-day working knowledge of Git with exposure to Gerrit

  • Extensive experience in developing ETLs and processing large datasets

  • Experience with Airflow data pipeline orchestration

  • Experience building data models to support BI data visualizations using Tableau

  • Familiarity with AWS services such as S3, EKS, and Kubectl is highly beneficial

  • Excellent written and verbal communication skills

Let’s Talk Pay

We believe in being upfront about pay and helping you make informed decisions about your career. The annual pay range for this role is $110,000-$130,000 inclusive of base salary and target incentive pay. We understand that great talent comes in many forms, each with unique skills, experience, and potential. Your salary will be tailored to reflect the experience you bring and the impact you’re ready to have on this role.

At Teranet, we also know that compensation extends far beyond a pay cheque. Along with your cash compensation, we offer a comprehensive package which includes the following:

  • 100% Employer Paid Health Benefit Plan

  • Employer Matching Retirement Savings Plan

  • Paid Vacation, Floater Days & Sick Leaves

  • Maternity, Parental and/or Adoption Leave Top-Up Programs

  • Corporate Discounts & GoodLife Group Rate Membership

  • Employee Assistance Program – for you and your loved ones!

Similar Jobs

More Jobs at Teranet, Inc.

More Information Technology Jobs

Find similar Big Data Engineer jobs: