People Data Labs (PDL) is the provider of people and company data. They need to build the best data available by integrating thousands of compliantly sourced datasets into a single, developer-friendly source of truth. Leading companies across the world use PDL’s workforce data to enrich recruiting platforms, power AI models, create custom audiences, and more. They are looking for a Data Engineer to help them solve complex data problems.
Requirements
- 5-7+ years of industry experience with clear examples of strategic technical problem-solving and implementation
- Strong software development fundamentals.
- Experience with Python
- Expertise with Apache Spark (Java, Scala, and/or Python-based)
- Experience with SQL
- Experience building scalable data processing systems (e.g., cleaning, transformation) from the ground up.
- Experience using developer-oriented data pipeline and workflow orchestration (e.g., Airflow (preferred), dbt, dagster or similar)
Responsibilities
- Build infrastructure for ingestion, transformation, and loading an exponentially increasing volume of data from a variety of sources using Spark, SQL, AWS, and Databricks
- Building an organic entity resolution framework capable of correctly merging hundreds of billions of individual entities into a number of clean, consumable datasets.
- Developing CI/CD pipelines and anomaly detection systems capable of continuously improving the quality of data we're pushing into production.
- Dreaming up solutions to largely undefined data engineering and data science problems.
Other
- Balance high ownership and autonomy with a strong ability to collaborate
- Work effectively remotely (able to be proactive about managing blockers, proactive on reaching out and asking questions, and participating in team activities)
- Demonstrate strong written communication skills on Slack/Chat and in documents
- Exhibt experience in writing data design docs (pipeline design, dataflow, schema design)
- Scope and breakdown projects, communicate and collaborate progress and blockers effectively with your manager, team, and stakeholders