Transform complex datasets into actionable insights, high-performing models, and scalable analytical workflows for a leading AI research lab.
Requirements
- Kaggle Competitions Grandmaster or comparable achievement: top-tier rankings, multiple medals, or exceptional competition performance
- Strong proficiency in Python and data tools (Pandas, NumPy, Polars, scikit-learn, etc.)
- Experience building ML models end-to-end: feature engineering, training, evaluation, and deployment
- Solid understanding of statistical methods, experiment design, and causal or quasi-experimental analysis
- Familiarity with modern data stacks: SQL, distributed datasets, dashboards, and experiment tracking tools
- Knowledge of LLMs, embeddings, and modern ML techniques for text, images, and multimodal data
- Experience working with big data ecosystems (Spark, Ray, Snowflake, BigQuery, etc.)
Responsibilities
- Analyze large, complex datasets to uncover patterns, develop insights, and inform modeling direction
- Build predictive models, statistical analyses, and machine learning pipelines across tabular, time-series, NLP, or multimodal data
- Design and implement robust validation strategies, experiment frameworks, and analytical methodologies
- Develop automated data workflows, feature pipelines, and reproducible research environments
- Conduct exploratory data analysis (EDA), hypothesis testing, and model-driven investigations to support research and product teams
- Translate modeling outcomes into clear recommendations for engineering, product, and leadership teams
- Collaborate with ML engineers to productionize models and ensure data workflows operate reliably at scale
Other
- 3–5+ years of experience in data science or applied analytics
- Excellent communication skills with the ability to clearly present analytical insights
- Strong contributions across multiple Kaggle tracks (Notebooks, Datasets, Discussions, Code)
- Experience in an AI lab, fintech, product analytics, or ML-focused organization
- Familiarity with statistical modeling frameworks such as Bayesian methods or probabilistic programming