Torc is seeking to transform how the world moves freight by developing software for automated trucks. The Machine Learning Frameworks Team builds the core infrastructure that powers this innovation, enabling scalable training, evaluation, and deployment of our models. The Senior Software Engineer will lead the design and development of our machine learning training and data infrastructure, taking ownership of foundational components of Torc’s ML platform.
Requirements
- Strong proficiency in Python, with a deep understanding of software engineering best practices (testing, CI/CD, version control, code reviews, agile workflows).
- Demonstrated experience with ML frameworks such as PyTorch or TensorFlow; hands-on experience with PyTorch Lightning is a plus.
- Solid experience with distributed computing and/or frameworks like Ray or Spark.
- Strong knowledge of cloud services (AWS preferred), containerization (Docker), and orchestration (Kubernetes).
- Proven ability to design scalable ML training pipelines and APIs.
- Hands-on experience with ML Ops workflows: dataset management, model registries, automated training/evaluation pipelines.
- Expertise in GPU programming, CUDA optimization, or performance tuning for ML workloads.
Responsibilities
- Lead the design and development of distributed training frameworks built on Ray and PyTorch Lightning, enabling scalable model training across large datasets and multi-GPU/multi-node clusters.
- Architect and implement high-performance pipelines for data ingestion, transformation, and delivery to ML training and evaluation workflows.
- Own and evolve shared ML libraries that serve as the foundation for all ML development at Torc.
- Collaborate closely with research, perception, and planning teams to understand requirements and translate them into reusable infrastructure solutions.
- Improve developer productivity by building robust internal tools, APIs, and automation for ML Ops.
- Optimize system performance, GPU/CPU utilization, and cloud resource efficiency.
- Champion best practices in software engineering, cloud-native ML infrastructure, and distributed systems.
Other
- 5+ years of professional software engineering experience.
- Track record of leading projects, influencing cross-functional teams, and mentoring other engineers.
- Excellent communication and collaboration skills with the ability to align technical solutions to business needs.
- Participate in the on-call rotation to ensure the reliability of ML infrastructure services.
- Drive adoption of new technologies, evaluate trade-offs, and align technical decisions with long-term autonomy goals.