Major League Soccer (MLS) needs to evolve its Fan Genome platform to deliver hyper-personalization and real-time insights across every fan interaction by building a next-generation data platform that supports BI self-service and robust analytics engineering frameworks.
Requirements
- Hands-on expertise in designing, deploying, and optimizing cloud-native data solutions on platforms such as AWS, Azure, or GCP
- Deep understanding of modern data architecture patterns, including Lakehouse design, data mesh principles, and data quality monitoring frameworks
- Strong computer science fundamentals with proficiency in at least one advanced programming language (Python, Scala, or Java)
- Proven experience with distributed processing frameworks (e.g., Apache Spark, Apache Flink) and real-time streaming architectures
- Expertise in Lakehouse data platforms built on object storage and open table formats (e.g., Apache Hudi, Apache Iceberg) for ACID transactions, schema evolution, and incremental processing
- Proficiency in Infrastructure-as-Code, orchestration, transformation frameworks, containers, and observability tools
- Familiarity with data science and machine learning workflows, including feature engineering, model training pipelines, and integration with feature stores
Responsibilities
- Own the technical architecture and feature delivery of MLS’s next-generation cloud-native Lakehouse platform ensuring scalability, performance, and reliability
- Optimize and enhance existing real-time data pipelines built on Apache Kafka, Amazon Kinesis, and Apache Flink to maintain low-latency ingestion and event-driven processing at scale
- Manage and improve distributed compute workflows leveraging Apache Spark for large-scale batch processing, advanced feature engineering, and ML-adjacent workloads
- Oversee and refine open table format implementations (Apache Hudi, Apache Iceberg) to ensure ACID compliance, schema evolution, and efficient incremental processing
- Drive performance tuning and cost optimization for zero-copy analytics using modern distributed, MPP, column-oriented OLAP systems designed for real-time, high-concurrency analytical workloads (e.g., StarRocks) and query engines like Presto
- Maintain and extend robust data APIs for both batch exports and point (per-fan) queries, integrated with Fan Genome’s feature store
- Advance identity resolution capabilities to ensure accurate, unified fan profiles across multiple data sources
Other
- Build, mentor, and scale a world-class data and analytics engineering team, fostering a culture of technical excellence and innovation
- Demonstrated ability to translate complex business requirements into scalable technical solutions, collaborating with data management, security, and privacy teams to ensure compliance and governance
- High-level of commitment to a quality work product and organizational ethics, integrity and compliance
- Ability to work effectively in a fast paced, team environment
- Strong interpersonal skills and the ability to effectively communicate, both verbally and in writing