Job Summary
The Big Data Engineer will be responsible for designing, developing, optimizing, and supporting scalable big data platforms and cloud-native applications within AWS environments. This role requires expertise in both batch processing and API-driven architectures, leveraging technologies such as Spark, Hadoop, Scala, PySpark, Java, and Snowflake. The ideal candidate will have strong experience building enterprise-grade data pipelines, microservices, and cloud-based analytics solutions while ensuring performance, scalability, and reliability.
Key Responsibilities
• Design, develop, implement, test, and maintain Big Data applications supporting both batch and API-based workloads.
• Build and optimize scalable data ingestion, transformation, and processing pipelines using Spark and related technologies.
• Develop and maintain cloud-native applications and data platforms within AWS environments.
• Design, develop, and support REST APIs and SOAP web services for data and application integration.
• Build and maintain microservices architectures using AWS services and modern development frameworks.
• Develop scalable batch processing solutions using Hadoop, Spark, EMR, and distributed computing technologies.
• Optimize data processing jobs and applications for performance, scalability, and cost efficiency.
• Develop and support data workflows using AWS Step Functions, Apache Airflow, and related orchestration tools.
• Design and implement data ingestion and transformation pipelines within Snowflake environments.
• Work with large-scale structured and unstructured datasets across distributed systems.
• Develop automation solutions using Python, shell scripting, and cloud-native tools.
• Support and maintain Cassandra databases and distributed data storage solutions.
• Troubleshoot production issues and perform root cause analysis across data platforms and applications.
• Collaborate with cross-functional teams to gather requirements and deliver scalable solutions.
• Participate in code reviews, testing, deployment activities, and continuous improvement initiatives.
• Maintain technical documentation and operational procedures.
Required Qualifications
• 7+ years of experience in the analysis, development, implementation, and testing of Big Data applications.
• Strong experience working within AWS cloud environments.
• Strong programming experience with Scala.
• Strong experience with PySpark and Apache Spark.
• Strong Java development experience, including REST APIs and SOAP web services.
• Experience with Hadoop ecosystem technologies including Hadoop, HDFS, Spark, and MapReduce.
• Hands-on experience with Cassandra.
• Experience building and supporting applications using AWS services including:
• Amazon EMR
• Amazon EC2
• Amazon ECS
• Amazon S3
• AWS Step Functions
• API Gateway
• Experience working with both batch processing systems and API-driven microservices architectures.
• Strong experience with performance tuning and optimization of Spark, Hadoop, and EMR workloads.
• Strong Linux administration and troubleshooting experience.
• Experience with shell scripting and Python automation.
• Experience building data ingestion and transformation pipelines using Snowflake and Spark.
• Strong analytical, troubleshooting, and problem-solving skills.
• Ability to work independently in a remote environment.
• Strong communication and collaboration skills.
Preferred Qualifications
• Experience with Python and Kotlin development.
• Experience with Apache Airflow for workflow orchestration.
• Experience with DevOps tools including Git, GitHub, GitLab, and Bitbucket.
• Experience with development tools such as IntelliJ IDEA and PyCharm.
• Experience implementing CI/CD pipelines using Maven, Gradle, Jenkins, and Artifactory.
• Experience with AWS CodeCommit and AWS CloudFormation.
• Experience with cloud-native architecture patterns and distributed systems design.
• Experience supporting enterprise-scale analytics and data platforms.
• Knowledge of data governance, security, and operational best practices.
Required Skills
• AWS (EMR, EC2, ECS, S3, Step Functions, API Gateway)
• Scala
• PySpark
• Java
• Hadoop
• HDFS
• Apache Spark
• MapReduce
• Cassandra
• Snowflake
• Linux
• Shell Scripting
• Python
• REST APIs
• SOAP Web Services
• Data Pipeline Development
• Batch Processing
• Performance Tuning
• Microservices Architecture