Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Boston Dynamics Logo

Staff Software Engineer, ML Tooling and Infrastructure

Boston Dynamics

Salary not specified
Oct 17, 2025
Waltham, MA, US
Apply Now

The Atlas team is developing the next generation of humanoid robotics and needs to build a robust, scalable, and efficient software foundation to accelerate development cycles for Large Behavior Models.

Requirements

  • 6+ years of professional experience designing, building, and maintaining production Python applications.
  • Proven experience deploying and optimizing neural network models in production or real-world environments.
  • Deep expertise with modern software development practices: build systems (like Bazel or Pants), monorepos, Docker, and Python packaging.
  • Strong familiarity with the ML ecosystem, including PyTorch, ONNX, and inference servers like NVIDIA Triton.
  • Hands-on experience implementing distributed (multi-GPU, multi-node) training on a compute cluster.
  • Proficiency with production-grade database systems (e.g., PostgreSQL), ORMs, and data orchestration tools (e.g., Airflow).
  • Experience in robotics, behavior learning, or computer vision (VLMs).

Responsibilities

  • Architect and Refactor: Take ownership of our Python-based training and inference infrastructure, relentlessly improving its quality, performance, and scalability.
  • Build with Quality: Implement comprehensive testing, champion best practices for code quality, and build automated CI/CD pipelines to ensure reliable deployment and validation.
  • Own MLOps: Design, build, and operate the MLOps infrastructure for our cutting-edge behavior models, focusing on reliability, reproducibility, and speed from training to deployment.
  • Enable Data Insights: Develop tools and dashboards for data collection, analysis, and visualization, empowering the team to make data-driven decisions.
  • Manage Data Flow: Design and maintain scalable data pipelines for ingesting, processing, and versioning massive datasets from our robotics fleet.
  • Optimize Performance: Improve and maintain tooling for both on-robot and off-robot model inference, focusing on latency, throughput, and efficiency.
  • Collaborate and Scale: Partner with central infrastructure teams to optimize shared resources (e.g., compute clusters) and drive improvements that benefit the entire organization.

Other

  • A Software Pragmatist: You are a software engineer first and foremost. You find joy in building tools, automating processes, and creating robust systems that make others more productive.
  • A Force Multiplier: You understand that great engineering is what turns brilliant ideas into reality. You are passionate about building systems that multiply the team's effectiveness, allowing them to experiment faster and more reliably. Your success is measured by the velocity and impact of the entire team.
  • Committed to Quality: You believe that testing, clean code, and solid architecture are not afterthoughts but are fundamental to moving fast and building things that last.
  • A Systems Thinker: You are comfortable working across the full stack, from data ingestion and databases to training clusters and on-device inference.
  • Familiarity with modern C++.
  • Experience with front-end or web development for building internal tools (e.g., React, Vue).