Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Senior AI/ML Engineer II

DigitalOcean

$200,000 - $260,400

Aug 28, 2025

Denver, CO, USA

DigitalOcean is building the next generation of agentic applications on the GradientAI platform, where multi-agent systems of LLM-powered agents collaborate, make decisions, and adapt at scale. The company is looking for an individual to design robust, scalable, and safe agent workflows that empower developers to build sophisticated AI-driven systems with confidence.

Requirements

Strong software engineering background and deep expertise in generative AI, multi-agent system design, guardrails, monitoring, and evaluation methodologies.
Proven experience in software development at scale, with strong foundations in distributed systems, system design, and cloud-native engineering.
Hands-on experience in shipping AI/ML systems into production.
Drive observability, guardrails, and evaluation best practices for multi-agent workflows, ensuring visibility, safety, and continuous improvement.
Ability to balance engineering trade-offs (reliability, latency, cost) with business outcomes.
Apply strong software engineering practices: testing, CI/CD, code quality, scalable architectures, and distributed system design.
5+ years of relevant industry experience in software engineering and deploying agentic AI systems in production within high-growth environments.

Responsibilities

Architect and deliver production-grade agentic systems: multi-agent orchestration, workflow management, state/memory handling, and runtime governance.
Design and orchestrate modular, LLM-powered agents (e.g., Planner, Tool Executor, QA, Validator) using scalable orchestration patterns (sequential, router, parallel, map-reduce), with clear handoff protocols, shared memory, and structured communication.
Define and enforce guardrails and governance: prompt sanitization, access control, audit trails, threat modeling, and strategies for injection defense, hallucination control, misuse prevention, and compliance.
Establish evaluation and monitoring methods for multi-agent systems: accuracy, safety, cost, and latency—leveraging observability practices (logs, telemetry, tracing, capturing intermediate outputs) and feedback loops to continuously refine performance.
Build fine-tuning and deployment pipelines: supervised fine-tuning, inference optimization, post-deployment updates, and scaling hardened systems with retries, error handling, and fairness checks.
Rapidly define and deliver MCPs (Minimum Capable Products): identify minimal agent roles and orchestration logic, validate quickly, and expand iteratively into robust multi-agent applications.
Integrate seamlessly with the GradientAI platform: ensuring agents leverage DO services (inference, KBs, Functions, storage, networking) for scale, reliability, and cost-efficiency.

Other

Journey alongside a strong community of top talent who are relentless in their drive to build the simplest scalable cloud.
If you have a growth mindset, naturally like to think big and bold, and are energized by the fast-paced environment of a true industry disruptor, you’ll find your place here.
We value winning together—while learning, having fun, and making a profound difference for the dreamers and builders in the world.
Partner closely with UX and design teams to ensure agentic features deliver simple, intuitive, and developer-first experiences.
Mentor and support teammates in applying guardrails, governance, and orchestration patterns consistently across projects.