Stitch Fix is looking to evolve its modern data stack to support the next wave of AI/ML-powered products, analytics, and knowledge systems, making the data platform easier to scale, find information in, and more reliable as they build AI into their workflows.
Requirements
- You have 5-8 years of experience building cloud-scale data infrastructure or ML platforms
- You're hands-on with Spark, SQL, and Python and/or Scala with strong experience building scalable APIs and services
- You understand streaming systems such as Kafka or Flink and how to design for real-time insights
- You have experience with orchestration systems, CI/CD, and production monitoring
- You're interested in defining data product standards, governance-by-design, and AI observability metrics
Responsibilities
- Evolve our core data stack (Spark, Trino, Iceberg, Kafka, Flink) to meet the scale and latency demands of AI workloads
- Define and implement observability, lineage, and access standards for our data products and AI applications
- Support critical data pipelines while creating abstractions that simplify end-user interactions
- Unify workflows from data ingestion to model serving, ensuring a shared foundation for feature stores, ML observability, and semantic modeling
Other
- You're detail-oriented, curious, and passionate about building infrastructure people love to use
- We value integrity, innovation and trust.
- We cultivate a community of diverse perspectives— all voices are heard and valued.
- We win as a team, commit to our work, and celebrate grit together because we value strong relationships.
- We boldly create the future while keeping equity and sustainability at the center of all that we do.