Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

DigitalOcean Logo

Senior AI/ML Engineer II tags.new

DigitalOcean

$210,000 - $260,000
Aug 28, 2025
San Francisco, CA, US
Apply Now

DigitalOcean is building the next generation of agentic applications on the GradientAI platform, where multi-agent systems of LLM-powered agents collaborate, make decisions, and adapt at scale. The company is looking for an individual to design robust, scalable, and safe agent workflows that empower developers to build sophisticated AI-driven systems with confidence.

Requirements

  • strong software engineering background and deep expertise in generative AI, multi-agent system design, guardrails, monitoring, and evaluation methodologies.
  • Proven experience in software development at scale, with strong foundations in distributed systems, system design, and cloud-native engineering.
  • Hands-on experience in shipping AI/ML systems into production.
  • Apply strong software engineering practices: testing, CI/CD, code quality, scalable architectures, and distributed system design.
  • 5+ years of relevant industry experience in software engineering and deploying agentic AI systems in production within high-growth environments.

Responsibilities

  • Architect and deliver production-grade agentic systems: multi-agent orchestration, workflow management, state/memory handling, and runtime governance.
  • Design and orchestrate modular, LLM-powered agents (e.g., Planner, Tool Executor, QA, Validator) using scalable orchestration patterns (sequential, router, parallel, map-reduce), with clear handoff protocols, shared memory, and structured communication.
  • Define and enforce guardrails and governance: prompt sanitization, access control, audit trails, threat modeling, and strategies for injection defense, hallucination control, misuse prevention, and compliance.
  • Establish evaluation and monitoring methods for multi-agent systems: accuracy, safety, cost, and latency—leveraging observability practices (logs, telemetry, tracing, capturing intermediate outputs) and feedback loops to continuously refine performance.
  • Build fine-tuning and deployment pipelines: supervised fine-tuning, inference optimization, post-deployment updates, and scaling hardened systems with retries, error handling, and fairness checks.
  • Rapidly define and deliver MCPs (Minimum Capable Products): identify minimal agent roles and orchestration logic, validate quickly, and expand iteratively into robust multi-agent applications.
  • Integrate seamlessly with the GradientAI platform: ensuring agents leverage DO services (inference, KBs, Functions, storage, networking) for scale, reliability, and cost-efficiency.

Other

  • We value winning together—while learning, having fun, and making a profound difference for the dreamers and builders in the world.
  • Collaborate cross-functionally with product managers, infra teams, design and UX, and other engineers to ship features that developers adopt and trust.
  • Participate and support in operational excellence
  • Independently ship product features from planning to launch to maintenance with high autonomy
  • Collaborate with other engineers to find elegant architectures and solutions