Job Description
KeyResponsibilities:
Design, develop, and maintain ETL (Extract, Transform, Load) processes to ensure the seamless integration of raw data from various sources into our data lakes or warehouses.
Utilize Python, PySpark, SQL, and AWS services like Lambda, Glue, Redshift, S3, etc., to process, analyze, and store large-scale datasets efficiently.
Develop and optimize data pipelines using tools such as AWS Glue for ETL tasks, PySpark for big data processing, and Python for scripting and automation. Additionally, experience with Apache Spark/Databricks is highly desirable for advanced ETL workflows.
Write and maintain SQL queries for data retrieval, transformation, and storage in relational databases like Redshift or PostgreSQL.
Collaborate with cross-functional teams, including data scientists, engineers, and domain experts to design and implement scalable solutions.
Troubleshoot and resolve performance issues, data quality problems, and errors in data pipelines.
Document processes, code, and best practices for future reference and team training.
AdditionalInformation:
Strong understanding of data governance, security, and compliance principles is preferred.
Ability to work independently and as part of a team in a fast-paced environment.
Excellent problem-solving skills with the ability to identify inefficiencies and propose solutions.
Experience with version control systems (e.g., Git) and scripting languages for automation tasks.
If you have what it takes tojoin our team as a DataEngineer, we encourage you to apply.