Datadog is looking to solve the problem of building an intelligent control plane for production systems, moving beyond passive monitoring to create a platform where AI agents can safely and effectively take action in live environments, and evolving observability infrastructure for stochastic, self-improving systems.
Requirements
- Strong software engineering foundation, ideally in C++, Rust, Go, or Python, and are comfortable writing performant, maintainable code
- Deep expertise in at least one of the following areas: query optimization, data center scheduling, compiler design, reinforcement learning, or distributed systems design
- Experience applying search, planning, or learning techniques to solve real-world optimization problems
- Hands-on experience in a production environment
- Experience with systems engineering, database internals, or infrastructure research
- Strong background in systems engineering, AI, and formal reasoning
- Expertise in areas like causal modeling, generative simulation, runtime verification, or reinforcement learning
Responsibilities
- Design and prototype intelligent systems for AI-native observability, including cost-aware agent orchestration, adaptive query execution, and self-optimizing system components.
- Apply reinforcement learning, search, or hybrid approaches to infrastructure-level decision-making, such as autoscaling, scheduling, or load shaping.
- Collaborate with AI researchers and platform engineers to design experimentation loops and verifiers that guide LLM outputs using runtime metrics and formal models.
- Explore emerging paradigms like AI compilers, “programming after code,” and runtime-aware prompt engineering to inform Datadog’s infrastructure and product design.
- Help define the direction of BitsEvolve - Datadog’s optimization agent that uses LLMs and evolutionary search to discover code improvements, optimize GPU kernels, and tune configurations to improve performance.
- Partner with product teams and platform stakeholders to ensure scientific advances translate into measurable improvements in cost, performance, and observability depth.
- Apply search, planning, or learning techniques to solve real-world optimization problems
Other
- BS/MS/PhD in a scientific field or equivalent experience
- 8+ years of experience in systems engineering, database internals, or infrastructure research
- Ability to work in a hybrid workplace and create a work-life harmony
- Strong collaboration and communication skills to work across research, engineering, and product teams
- Hypothesis-driven and enjoy designing experiments and evaluation loops