LMArena is seeking a Software Engineer to build the data pipelines and infrastructure that powers real-world AI evaluation, processing and analyzing tens of millions user vote data to understand and evaluate AI model performance.
Requirements
- 5+ years of experience in software engineering, with a dedicated focus on data engineering and big data technologies
- Proficiency in SQL and at least one programming language commonly used for data analysis (Python (preferred), Scala, R).
- Hands-on experience with data processing and pipeline frameworks (Apache Spark, Ray Data, etc.) and at least one popular big data analytics platform (Databricks, Snowflake).
- Demonstrated experience in designing, implementing, optimizing, and debugging production data pipelines.
- Prior work in data analytics or datalake platforms.
- Experience in advanced data analysis tools, such as Delta lake, streaming tables.
- Exposure to machine learning is a plus.
Responsibilities
- Design and build robust data pipelines to ingest, process, and transform user vote data to features essential for model performance evaluation.
- Collaborate with researchers and product leadership to understand product goals and necessary data.
- Design and implement solutions to generate result dashboards and reports, providing useful information for the public, model providers, and researchers.
- Ensure the integrity, data quality, and reliability of the pipelines.
- Scale our data infrastructure to accommodate increasing data volumes and evolving analytical needs.
Other
- Thrives in fast-moving environments
- Interested in building products to ensure accurate and fair evaluation of human preferences across different models
- Partner closely with researchers, engineers, and product leadership
- Help us move fast while staying rigorous
- Competitive compensation and equity aligned to the markets where our team members are based.