Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Staff Data Engineer - AI (Remote)

Rula

Salary not specified

Sep 2, 2025

Los Angeles, California, US

The company aims to improve mental healthcare by developing AI-enabled experiences that enhance the human connection in therapy, making it more transparent, personalized, and accessible. The Data Engineer will support this by building and maintaining data pipelines for training machine learning models and AI tools to improve patient outcomes.

Requirements

8+ years of Data Pipeline Development – specifically building and maintaining scalable ETL/ELT pipelines for ML/AI training workflows, using tools like AWS Glue, DBT, Dagster, Spark, or Ray for distributed processing of large-scale structured and unstructured data from Data Lakes
Strong proficiency in Spark, Python, and SQL for feature engineering, data transformation, and ensuring high-quality, versioned datasets suitable for model training and inference
8+ Years of Cloud Infrastructure & Data Warehousing experience, 4+ of which with a focus in AWS
proficient in AWS services such as Redshift, S3, Glue, IAM, EMR, and SageMaker for supporting ML/AI pipelines
Experience optimizing data warehouses (e.g., Redshift, Snowflake, BigQuery) and managing data lakes (e.g., S3, GCS, Azure Blob) for large-scale, versioned ML training datasets, with a focus on partitioning, access controls, and integration with distributed processing frameworks like Spark
Implementing scalable data validation, quality checks, and error-handling mechanisms tailored for ML/AI pipelines, including bias detection, anomaly identification, and dataset integrity to ensure high-fidelity training data
Experience with data security measures (encryption, role-based access control, data masking)

Responsibilities

build and maintain the data pipelines that pull information from our central storage system to train machine learning models and AI tools
designing reliable flows of information
testing for accuracy
solving unexpected challenges
building and maintaining scalable ETL/ELT pipelines for ML/AI training workflows, using tools like AWS Glue, DBT, Dagster, Spark, or Ray for distributed processing of large-scale structured and unstructured data from Data Lakes
Implementing scalable data validation, quality checks, and error-handling mechanisms tailored for ML/AI pipelines, including bias detection, anomaly identification, and dataset integrity to ensure high-fidelity training data
Optimizing data pipelines, queries, and managing large datasets for efficiency and scalability

Other

Strong ability to work cross-functionally with data analysts, data scientists, and stakeholders
Effective communication skills to explain technical concepts to non-technical audiences
Adaptability to thrive in a fast-paced startup environment
100% remote work environment (US-based only)
Working hours to support a healthy work-life balance