Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

AI Research Engineer – Datadog AI Research (DAIR)

Datadog

Salary not specified

Aug 21, 2025

New York, NY, US

Datadog is looking to solve high-risk, high-reward projects grounded in real-world challenges in cloud observability and security by turning research ideas into working systems

Requirements

Strong software engineering skills with experience in domains such as observability, SRE, or security
Depth in distributed computing and ML systems for training and inference at scale; experience with Ray, Slurm, or similar frameworks is a plus
Proficient in Python, familiar with a systems language (e.g., Rust, C++, or Go), and comfortable with modern cloud and data infrastructure
Practical experience implementing and operating ML training and inference systems (e.g., PyTorch or JAX), including containerization, orchestration, and GPU acceleration
Familiar with efficient training, fine-tuning, and inference techniques for large foundation models
Experience with GPU programming and optimization, including experience in CUDA
Experience writing production data pipelines and applications

Responsibilities

Build and operate datasets, training and evaluation pipelines, benchmarks, and internal tooling
Implement models, run experiments at scale, and profile for reliability, performance, and cost
Orchestrate distributed training and distributed RL with Ray, including scheduling, scaling, and failure recovery
Make the research stack observable, reproducible, and easier to use
Establish rigorous automated benchmarks and regression tests for forecasting, anomaly detection, multi-modal analysis, agents, and code repair tasks
Collaborate with Research Scientists, Product, and Engineering to integrate advanced AI capabilities into Datadog's product ecosystem and to harden prototypes into reliable services
Contribute high-quality code, documentation, and open-source artifacts that enable the community and internal teams to reproduce, extend, and evaluate results

Other

Bachelor's, Master's, or Ph.D. degree in a relevant field
Ability to explain design and performance trade-offs clearly to both technical and non-technical audiences
Strong interest in open-science and open-source contributions, including establishing rigorous benchmarks and sharing artifacts with the community
Ability to work in a collaborative environment and communicate effectively with colleagues
Passion for pushing the boundaries of AI while maintaining a strong focus on customer impact, scalability, and responsible deployment of new technologies