Shape the future of our data platform with a focus on small data at scale, bridging the gap between lightweight embedded tools and cloud-scale systems.
Requirements
- Deep expertise in SQL (window functions, CTEs, optimization).
- Strong Python skills with data libraries.
- Proficiency with DuckDB (extensions, parquet/iceberg integration, embedding in pipelines).
- Hands-on with columnar formats (Parquet, Arrow, ORC) and schema evolution.
- Expertise in Kubernetes and Helm
- Cloud storage experience (AWS S3, GCS).
- Experience with semantic layer frameworks (CubeJS).
Responsibilities
- Design and implement data processing workflows using DuckDB, Polars, and Arrow/Parquet.
- Balance small-data local pipelines with cloud data warehouse backends (Snowflake etc).
- Advocate for efficient, vectorized, local-first approaches where appropriate.
- Drive best practices for designing reproducible and testable data workflows.
- Partner with data science, professional services, and product engineering teams to define semantic data layers.
- Provide technical leadership in how data is versioned, validated, and surfaced for downstream use.
- Establish standards for CI/CD, observability, and reliability in data pipelines.
Other
- Serve as a thought leader in the organization, guiding engineers on when to use lightweight tools vs. distributed platforms.
- Mentor senior and mid-level data engineers to accelerate their growth.
- Track record of leading architecture decisions and mentoring teams.
- Ability to set standards for maintainability and developer experience.
- The successful candidate's starting salary will be determined based on a number of non-discriminating factors, including qualifications for the role, level, skills, experience, location, and balancing internal equity relative to peers at DV.