Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Senior AI/ML Engineer II

DigitalOcean

$200,000 - $260,400

Aug 28, 2025

Denver, CO, US

DigitalOcean is looking to build the next generation of agentic applications on the GradientAI platform, where multi-agent systems of LLM-powered agents collaborate, make decisions, and adapt at scale, and needs someone to design robust, scalable, and safe agent workflows that empower developers to build sophisticated AI-driven systems with confidence.

Requirements

Proven experience in software development at scale, with strong foundations in distributed systems, system design, and cloud-native engineering.
Hands-on experience in shipping AI/ML systems into production.
Drive observability, guardrails, and evaluation best practices for multi-agent workflows, ensuring visibility, safety, and continuous improvement.
Strong software engineering background and deep expertise in generative AI, multi-agent system design, guardrails, monitoring, and evaluation methodologies.
Experience with scalable orchestration patterns (sequential, router, parallel, map-reduce)
Knowledge of cloud-native engineering and distributed systems
Experience with AI/ML systems and multi-agent systems

Responsibilities

Architect and deliver production-grade agentic systems: multi-agent orchestration, workflow management, state/memory handling, and runtime governance.
Design and orchestrate modular, LLM-powered agents (e.g., Planner, Tool Executor, QA, Validator) using scalable orchestration patterns (sequential, router, parallel, map-reduce), with clear handoff protocols, shared memory, and structured communication.
Define and enforce guardrails and governance: prompt sanitization, access control, audit trails, threat modeling, and strategies for injection defense, hallucination control, misuse prevention, and compliance.
Establish evaluation and monitoring methods for multi-agent systems: accuracy, safety, cost, and latency—leveraging observability practices (logs, telemetry, tracing, capturing intermediate outputs) and feedback loops to continuously refine performance.
Build fine-tuning and deployment pipelines: supervised fine-tuning, inference optimization, post-deployment updates, and scaling hardened systems with retries, error handling, and fairness checks.
Rapidly define and deliver MCPs (Minimum Capable Products): identify minimal agent roles and orchestration logic, validate quickly, and expand iteratively into robust multi-agent applications.
Integrate seamlessly with the GradientAI platform: ensuring agents leverage DO services (inference, KBs, Functions, storage, networking) for scale, reliability, and cost-efficiency.

Other

5+ years of relevant industry experience in software engineering and deploying agentic AI systems in production within high-growth environments.
Ability to balance engineering trade-offs (reliability, latency, cost) with business outcomes.
Ability to work remotely
Must be willing to participate and support in operational excellence
Must be able to independently ship product features from planning to launch to maintenance with high autonomy