Major League Soccer (MLS) is looking to evolve its Fan Genome platform by building a next-generation data platform that powers hyper-personalization and real-time insights across fan interactions, while also delivering BI self-service and robust analytics engineering frameworks.
Requirements
- Hands-on expertise in designing, deploying, and optimizing cloud-native data solutions on platforms such as AWS, Azure, or GCP
- Deep understanding of modern data architecture patterns, including Lakehouse design, data mesh principles, and data quality monitoring frameworks
- Strong computer science fundamentals with proficiency in at least one advanced programming language (Python, Scala, or Java)
- Proven experience with distributed processing frameworks (e.g., Apache Spark, Apache Flink) and real-time streaming architectures
- Expertise in Lakehouse data platforms built on object storage and open table formats (e.g., Apache Hudi, Apache Iceberg) for ACID transactions, schema evolution, and incremental processing
- Proficiency in Infrastructure-as-Code, orchestration, transformation frameworks, containers, and observability tools
- Deep BI and analytics expertise, including: Designing and implementing analytics engineering frameworks for governed, reusable data models
Responsibilities
- Own the technical architecture and feature delivery of MLS’s next-generation cloud-native Lakehouse platform ensuring scalability, performance, and reliability.
- Optimize and enhance existing real-time data pipelines built on Apache Kafka, Amazon Kinesis, and Apache Flink to maintain low-latency ingestion and event-driven processing at scale.
- Manage and improve distributed compute workflows leveraging Apache Spark for large-scale batch processing, advanced feature engineering, and ML-adjacent workloads.
- Oversee and refine open table format implementations (Apache Hudi, Apache Iceberg) to ensure ACID compliance, schema evolution, and efficient incremental processing.
- Drive performance tuning and cost optimization for zero-copy analytics using modern distributed, MPP, column-oriented OLAP systems designed for real-time, high-concurrency analytical workloads (e.g., StarRocks) and query engines like Presto.
- Maintain and extend robust data APIs for both batch exports and point (per-fan) queries, integrated with Fan Genome’s feature store.
- Advance identity resolution capabilities to ensure accurate, unified fan profiles across multiple data sources.
Other
- 10+ years of progressive experience in data engineering or platform engineering, including 8+ years in leadership roles with a proven track record of delivering production-grade, large-scale data and analytics platforms
- Demonstrated ability to translate complex business requirements into scalable technical solutions, collaborating with data management, security, and privacy teams to ensure compliance and governance
- High-level of commitment to a quality work product and organizational ethics, integrity and compliance
- Ability to work effectively in a fast paced, team environment
- Strong interpersonal skills and the ability to effectively communicate, both verbally and in writing