RTW Investments is seeking a Data Engineer to help design and maintain lightweight ontologies and schemas, build reliable data pipelines in Databricks on Azure, and support graph-backed use cases (entity linking, relationship modeling, semantic search).
Requirements
- Proficiency in Python and SQL; comfort with PySpark for distributed transforms.
- Hands-on experience with Databricks (notebooks, jobs/workflows) and Delta Lake fundamentals.
- Working knowledge of Azure data services (at least ADLS Gen2 and Key Vault).
- Foundational KG concepts: nodes/edges/properties, ontologies/taxonomies, schemas; ability to explain how a table maps to a graph model.
- Exposure to at least one KG tool or language (e.g., Neo4j/Cypher, RDF/OWL, SPARQL)—academic or project experience is acceptable.
- Strong attention to detail, documentation habits, and version control (Git).
- Neo4j ecosystem (Neo4j Desktop, Aura, APOC, py2neo, others) or Stardog or Azure/AWS managed graph services.
Responsibilities
- Implement and maintain basic ETL/ELT pipelines on Databricks (PySpark, SQL, Delta Lake) to ingest, transform, and publish curated datasets.
- Contribute to KG modeling: draft and extend ontologies/taxonomies, define schemas (entities, relationships, properties), and document naming conventions.
- Build “graph ETL” flows to load nodes/edges into a KG tool (e.g., Stardog or Neo4j) from tabular sources (CSV, Delta tables), including upsert logic and basic data quality checks.
- Author queries over the graph (e.g., Cypher or SPARQL) to validate relationships and support downstream analytics.
- Collaborate with data scientists/analysts to understand entity definitions, resolve identity (de-duplication, matching), and map source systems to the KG.
- Maintain reproducible, version-controlled jobs (Git) and contribute to simple CI checks (lint, tests).
- Write clear technical docs (schemas, lineage notes, how-to run jobs) and contribute to team knowledge base.
Other
- 2–3 years of experience in data engineering, analytics engineering, or similar (internships/co-ops count).
- Curiosity about graph modeling and how semantics improve analytics.
- Pragmatism—start simple, iterate, measure.
- Clear communication, code readability, and consistent documentation.
- Ownership and a growth mindset; you seek feedback and improve quickly.