Baton, Ryder's in-house product development group, aims to redefine transportation and logistics by building category-defining software that enables intelligent, efficient, and cost-effective freight planning and execution. The Staff Software Engineer - Infrastructure role specifically addresses the need to enhance machine learning infrastructure for distributed systems and ML operations, enabling faster and more reliable deployment of ML models into production.
Requirements
- Advanced proficiency in Python at a Staff Level
- Must be within a production environment where the code directly impacts operations.
- Experience in distributed computing, scalable ML infrastructure, & high-performance engineering.
- Scales ML infra for multiple teams and use cases.
- Experience implementing and serving ML algorithms.
- Ensures reproducibility, lineage, and experiment rigor.
- Owns end-to-end ML systems: training, deployment, features, monitoring, rollback.
Responsibilities
- Build and scale distributed systems for ML training, serving, and inference.
- Design and implement real-time ML workflows that power core product features.
- Build robust distributed systems tailored for efficient ML training and seamless operational deployment.
- Streamline and manage both online and offline feature stores, optimizing feature engineering processes for greater efficiency.
- Improve real-time machine learning workflows to support dynamic decision-making and automate core operational processes.
- Lead the development of ML Ops systems, including model deployment, monitoring, and experiment tracking.
- Write production-grade Python that operates at scale, with reliability and performance top of mind.
Other
- Hybrid Work Model
- Leads design and delivery of large-scale ML or distributed systems.
- Sets technical direction and elevates ML engineering standards.
- Communicates vision and trade-offs across disciplines.
- Can Mentor other ML engineers on the team.