Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

DigitalOcean Logo

Senior AI/ML Engineer II tags.new

DigitalOcean

$190,000 - $217,400
Aug 28, 2025
Seattle, WA, US
Apply Now

DigitalOcean is looking to build the next generation of agentic applications on the GradientAI platform, where multi-agent systems of LLM-powered agents collaborate, make decisions, and adapt at scale, and needs someone to design robust, scalable, and safe agent workflows that empower developers to build sophisticated AI-driven systems with confidence.

Requirements

  • Proven experience in software development at scale, with strong foundations in distributed systems, system design, and cloud-native engineering.
  • Hands-on experience in shipping AI/ML systems into production.
  • Drive observability, guardrails, and evaluation best practices for multi-agent workflows, ensuring visibility, safety, and continuous improvement.
  • Strong software engineering background and deep expertise in generative AI, multi-agent system design, guardrails, monitoring, and evaluation methodologies.
  • Experience with scalable orchestration patterns (sequential, router, parallel, map-reduce)
  • Knowledge of cloud-native engineering and distributed systems
  • Experience with AI/ML systems and multi-agent workflows

Responsibilities

  • Architect and deliver production-grade agentic systems: multi-agent orchestration, workflow management, state/memory handling, and runtime governance.
  • Design and orchestrate modular, LLM-powered agents (e.g., Planner, Tool Executor, QA, Validator) using scalable orchestration patterns (sequential, router, parallel, map-reduce), with clear handoff protocols, shared memory, and structured communication.
  • Define and enforce guardrails and governance: prompt sanitization, access control, audit trails, threat modeling, and strategies for injection defense, hallucination control, misuse prevention, and compliance.
  • Establish evaluation and monitoring methods for multi-agent systems: accuracy, safety, cost, and latency—leveraging observability practices (logs, telemetry, tracing, capturing intermediate outputs) and feedback loops to continuously refine performance.
  • Build fine-tuning and deployment pipelines: supervised fine-tuning, inference optimization, post-deployment updates, and scaling hardened systems with retries, error handling, and fairness checks.
  • Rapidly define and deliver MCPs (Minimum Capable Products): identify minimal agent roles and orchestration logic, validate quickly, and expand iteratively into robust multi-agent applications.
  • Integrate seamlessly with the GradientAI platform: ensuring agents leverage DO services (inference, KBs, Functions, storage, networking) for scale, reliability, and cost-efficiency.

Other

  • 5+ years of relevant industry experience in software engineering and deploying agentic AI systems in production within high-growth environments.
  • Ability to balance engineering trade-offs (reliability, latency, cost) with business outcomes.
  • Collaborate cross-functionally with product managers, infra teams, design and UX, and other engineers to ship features that developers adopt and trust.
  • Participate and support in operational excellence
  • Independently ship product features from planning to launch to maintenance with high autonomy