Senior PySpark Data Engineer

Tata Consultancy Services • $125K — $140K *

Irving, TX 75061In-Person

Information Technology

Less than 5 years of experience

Today

Be an Early Applicant

By clicking Apply, I agree with Ladders' Terms of Use and Privacy Policy

Job Overview by Ladders

Qualifications

5-7 years of experience in data engineering roles
High proficiency in Apache Spark architecture and its components
Exceptional skills in Python and PySpark for ETL processes
Strong command of HiveQL and ANSI SQL
Hands-on experience in cloud environments like AWS, Azure, or GCP
Foundation in Dimensional Data Modeling with practical Data Lakes experience

Responsibilities

Design, build, and maintain scalable ETL/ELT data pipelines using PySpark and Spark SQL
Deploy and manage data infrastructure on AWS, Azure, or GCP
Optimize data layout and access in Apache Hive and cloud data lakes
Identify and resolve performance issues in Spark jobs through optimization techniques
Develop solutions for high-volume data ingestion from various sources
Implement automated workflows using Apache Airflow or other scheduling tools
Collaborate with data scientists and analysts to translate business needs into data solutions

Benefits

Access to leading technologies and tools
Opportunity for continuous learning and professional development
Collaboration with cross-functional teams for diverse projects
Flexible work environment with potential for remote work
Support for obtaining professional cloud certifications

Full Job Description

Roles & Responsibilities

Job Title: Data Engineer

Job Description:

We are seeking a highly skilled and motivated Data Engineer to play a pivotal role in designing, building, and optimizing our next-generation scalable data pipelines. This position requires expertise in processing massive datasets using cutting-edge technologies like Apache Spark, PySpark, and Hive within a dynamic cloud environment. Your primary objective will be to ensure the utmost data reliability, speed, and efficiency, providing a robust foundation for downstream business intelligence and advanced analytics initiatives.

Roles & Responsibilities:
• Data Pipeline Development & Maintenance: Design, build, and maintain highly scalable and efficient ETL/ELT data pipelines utilizing PySpark and Spark SQL for complex data transformations.
• Cloud Data Infrastructure Management: Deploy, manage, and scale critical data infrastructure components on leading cloud platforms such as Amazon Web Services (AWS) (e.g., EMR, Glue), Microsoft Azure (e.g., Databricks, Synapse), or Google Cloud Platform (GCP).
• Data Warehousing & Storage Optimization: Strategically manage data layout, partitioning, and indexing within Apache Hive and various cloud data lake solutions to optimize performance and accessibility.
• Performance Tuning & Optimization: Proactively identify and resolve performance bottlenecks in Spark jobs, leveraging Spark UI for in-depth analysis, effectively managing data skewness, and optimizing memory utilization.
• Diverse Data Integration: Develop robust solutions for ingesting high-volume and diverse datasets from both structured relational databases and unstructured flat files into our data ecosystem.
• Automated Workflow Orchestration: Implement and manage automated data workflows using industry-standard scheduling tools like Apache Airflow or platform-native schedulers, ensuring timely and reliable data delivery.
• Strategic Collaboration: Partner closely with data scientists, business analysts, and cross-functional enterprise teams to translate complex business requirements into technically sound and efficient data solutions.

Qualifications:
• Big Data Frameworks Expertise: Demonstrated high proficiency in Apache Spark architecture, including a deep understanding of drivers, executors, and Directed Acyclic Graphs (DAGs).
• Advanced Programming: Exceptional coding skills in Python and extensive experience with the PySpark API for developing intricate data transformations and processing logic.
• Querying & Schema Management: Strong command of HiveQL and ANSI SQL, coupled with expertise in data partitioning techniques and effective schema definition.
• Optimized Storage Formats: In-depth understanding and practical experience with optimized big data storage file formats such as Parquet, ORC, and Avro.
• Cloud Ecosystem Development: Hands-on development experience utilizing cloud-native big data utilities (e.g., AWS EMR, Azure Databricks) with in major cloud platforms.
• Data Warehousing Fundamentals: Solid foundation in Dimensional Data Modeling, including Star and Snowflake schemas, and practical experience with Data Lakes concepts and implementation.

Preferred Qualifications
• CI/CD & DevOps Automation: Experience with Continuous Integration/Continuous Deployment (CI/CD) practices and automation tools like Git, Jenkins, or Ansible.
• NoSQL Database Integration: Exposure to and experience with NoSQL databases such as HBase, Cassandra, or MongoDB.
• Professional Cloud Certifications: Relevant professional cloud certifications (e.g., AWS Certified Data Engineer, Microsoft Certified: Azure Data Engineer Associate) are highly valued

Salary Range: $125,000 to $140,000 per year

About Tata Consultancy Services

Tata Consultancy Services (TCS) is an Indian multinational information technology (IT) services and consulting company, headquartered in Mumbai, Maharashtra, India. It is a subsidiary of Tata Group and operates in 149 locations across 46 countries. TCS is the largest Indian company by market capitalization and is ranked 11th on the Forbes Global 2000 list of the world's biggest public companies. TCS is also the second-largest IT services company in the world by revenue and the largest employer of women in India. The company provides services in areas including IT, consulting, and business solutions.

Learn more about Tata Consultancy Services

Size

469,261 employees

Industry

Information Technology

* Ladders Estimates

Similar Jobs

Senior Data Engineer
$100K — $130K *
LTM
Irving, TX 75061 (Dallas County)
Reposted Today
Lead Business Intelligence Engineer
$120K — $150K *
DTCC
Coppell, TX 75019 (Dallas County)
Today
Lead Business Intelligence Engineer
$120K — $150K *
DTCC
Dallas, TX 75217 (Dallas County)
Today
Senior Data Engineer
$100K — $130K *
Wells Fargo
Irving, TX 75061 (Dallas County)
Today
Database Administrator/ETL & Data Integration Engineer
$84K — $138K *
Texas Health and Human Services Commission
Austin, TX 78745 (Travis County)
Today
Informatica MDM Engineer
$100K — $130K *
Prophecy Technologies
Plano, TX 75025 (Collin County)
Today

Get Ready For Your
Next Interview

More Jobs at Tata Consultancy Services

Program Director - S4 HANA EWM
$180K — $250K *
Edison, NJ 08817 (Middlesex County)
Today
Aerospace & Defense
In-Person
Senior GitLab System Administator
$110K — $120K *
Seattle, WA 98115 (King County)
Today
Information Technology
In-Person
Senior DevOps Engineer
$100K — $120K *
Charlotte, NC 28269 (Mecklenburg County)
Today
Information Technology
In-Person
Storage Senior Engineer
$75K — $120K *
San Jose, CA 95123 (Santa Clara County)
Today
Information Technology
In-Person
Advanced Aerodynamics Engineer
$120K — $140K *
Cincinnati, OH 45238 (Hamilton County)
Reposted Today
Aerospace & Defense
In-Person

More Information Technology Jobs

SDET (Software Development Engineer In Test)
Confidential Company
Washington, DC 20001 (District Of Columbia County)
2 days ago
Client Partner - Banking / Financial Services / Capital Markets
$325K — $350K + $100K bonus *
Large IT Services Firm (client of TechLink Systems)
New York, NY 10001 (New York County)
2 weeks ago
AIML Data Collections Project Manager
$120K — $160K *
Apple
Cupertino, CA 95014 (Santa Clara County)
Today
R&D Engineering, Sr Engineer
$120K — $150K *
Synopsys Inc
Austin, TX 78745 (Travis County)
Today
Data Engineer - Battery Data Platform & AI
$130K — $180K *
Apple
Cupertino, CA 95014 (Santa Clara County)
Today

Find similar Senior PySpark Data Engineer jobs:

Nationwide Irving, TX

Senior PySpark Data Engineer

Job Overview by Ladders

Full Job Description

Get Ready For Your Next Interview

Find similar Senior PySpark Data Engineer jobs:

Get Ready For Your
Next Interview