Rula aims to make mental health care more accessible and effective. The Staff Data Engineer for Operational Reporting will design and implement a near real-time data platform to deliver critical operational reports and insights, enabling data-driven decisions to improve patient outcomes.
Requirements
- Data Pipeline Development (8+ yrs). Experience designing and maintaining scalable ETL/ELT pipelines for operational reporting using Kafka, Glue, dbt, Dagster, and Airflow.
- Leveraging Python and SQL for data transformation and quality checks, and working with Flink and Spark Streaming to build low-latency, near real-time pipelines.
- Cloud Infrastructure & Data Warehousing (8+ yrs overall, 4+ yrs in AWS). Proficiency building and optimizing data pipelines using AWS services such as S3, Redshift, Glue, IAM, Kinesis, and EMR.
- Experience across GCP (BigQuery, Dataflow) and Azure (Synapse, Data Factory).
- Optimizing data warehouses (Redshift, Snowflake, BigQuery) and managing Data Lakes (S3, Delta Lake) for scalable, low-latency analytics.
- Data Quality & Governance (8+ Years). Experience implementing scalable data validation, quality checks (e.g., deduplication, consistency), and error-handling mechanisms tailored for operational reporting pipelines, ensuring high-fidelity data for real-time dashboards and analytics.
- Performance Optimization (3+ Years). Experience optimizing data pipelines, queries, and large-scale datasets for efficiency and scalability in operational reporting systems, with a focus on achieving low-latency delivery.
Responsibilities
- Oversee the design and implementation of a greenfield near real-time data platform, starting with micro-batching pipelines using Kafka to deliver critical operational reports and evolving into a scalable Apache Flink architecture for sub-second analytics.
- Build fault-tolerant pipelines, ensuring data accuracy, and optimizing for low-latency delivery.
- Own a strategic transition from micro-batching to a Flink-based streaming architecture.
- Design and maintain scalable ETL/ELT pipelines for operational reporting using Kafka, Glue, dbt, Dagster, and Airflow.
- Leverage Python and SQL for data transformation and quality checks, and working with Flink and Spark Streaming to build low-latency, near real-time pipelines.
- Build and optimize data pipelines using AWS services such as S3, Redshift, Glue, IAM, Kinesis, and EMR.
- Implement scalable data validation, quality checks (e.g., deduplication, consistency), and error-handling mechanisms tailored for operational reporting pipelines, ensuring high-fidelity data for real-time dashboards and analytics.
Other
- 100% remote work environment (US-based only)
- Working hours to support a healthy work-life balance, ensuring you can meet both professional and personal commitments
- Strong ability to work cross-functionally with business analysts, product managers, leadership, and other stakeholders to define and deliver operational reporting requirements.
- Exceptional communication skills to translate complex technical concepts into clear, actionable insights for non-technical audiences.
- Proven adaptability to thrive in a fast-paced startup environment, collaborating effectively to support the rapid development and evolution of a near real-time data platform while aligning with Rula’s mission to improve mental health care outcomes.