Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Datadog Logo

AI Research Engineer – Datadog AI Research (DAIR)

Datadog

Salary not specified
Aug 21, 2025
New York, NY, US
Apply Now

Datadog is looking to solve high-risk, high-reward projects grounded in real-world challenges in cloud observability and security by turning research ideas into working systems

Requirements

  • Strong software engineering skills with experience in domains such as observability, SRE, or security
  • Depth in distributed computing and ML systems for training and inference at scale; experience with Ray, Slurm, or similar frameworks is a plus
  • Proficient in Python, familiar with a systems language (e.g., Rust, C++, or Go), and comfortable with modern cloud and data infrastructure
  • Practical experience implementing and operating ML training and inference systems (e.g., PyTorch or JAX), including containerization, orchestration, and GPU acceleration
  • Familiar with efficient training, fine-tuning, and inference techniques for large foundation models
  • Experience with GPU programming and optimization, including experience in CUDA
  • Experience writing production data pipelines and applications

Responsibilities

  • Build and operate datasets, training and evaluation pipelines, benchmarks, and internal tooling
  • Implement models, run experiments at scale, and profile for reliability, performance, and cost
  • Orchestrate distributed training and distributed RL with Ray, including scheduling, scaling, and failure recovery
  • Make the research stack observable, reproducible, and easier to use
  • Establish rigorous automated benchmarks and regression tests for forecasting, anomaly detection, multi-modal analysis, agents, and code repair tasks
  • Collaborate with Research Scientists, Product, and Engineering to integrate advanced AI capabilities into Datadog's product ecosystem and to harden prototypes into reliable services
  • Contribute high-quality code, documentation, and open-source artifacts that enable the community and internal teams to reproduce, extend, and evaluate results

Other

  • Bachelor's, Master's, or Ph.D. degree in a relevant field
  • Ability to explain design and performance trade-offs clearly to both technical and non-technical audiences
  • Strong interest in open-science and open-source contributions, including establishing rigorous benchmarks and sharing artifacts with the community
  • Ability to work in a collaborative environment and communicate effectively with colleagues
  • Passion for pushing the boundaries of AI while maintaining a strong focus on customer impact, scalability, and responsible deployment of new technologies