Datadog is looking for an experienced engineer to join their Workflow Engine Team to help shape the future of Atlas, their platform for building reliable, long-running workflows as code, by evolving its architecture, improving performance and resilience, and making it the go-to workflow platform across Datadog, while also supporting Datadog's AI initiatives.
Requirements
- 8+ years of experience building large-scale, distributed systems in production
- Deep expertise in systems programming, workflow orchestration, or related domains (job scheduling, stream processing, etc.)
- Experience designing for durability and correctness in stateful systems
- Skilled at making architectural decisions and leading complex projects
- Fluent in at least one systems-level language (e.g., Go, Java, C++, Rust)
- Prior experience with Temporal or another workflow orchestration system
Responsibilities
- Design and implement high-scale, reliable, and durable workflow execution infrastructure on top of Temporal
- Lead the evolution of Atlas to meet Datadog's growing scale and reliability needs, running many million of actions per minute
- Support Datadog's AI initiatives by evolving Atlas into the orchestration backbone for AI agents and enabling an AI-first development mindset internally
- Partner with platform and product teams to make Atlas the standard for orchestrating workflows company-wide
- Drive technical strategy for resilience, durability, and performance optimization
- Mentor engineers and foster best practices in distributed systems development
Other
- Collaborative, with a track record of mentoring and growing other engineers
- Passionate about technology and want to grow your skills
- Work-life harmony
- Work excellence
- Professional development