Mercor collaborates with the world’s leading AI research labs to build and train cutting-edge AI models. This role is to design and implement new PyTorch tensor operators in C++/ATen to support this work.
Requirements
- Deep expertise in PyTorch internals, including TensorIterator, dispatcher, and autograd engine.
- Strong skills in modern C++ (C++17+) and template metaprogramming within PyTorch ecosystems.
- Experience creating or extending custom PyTorch ops or backend implementations.
- Familiarity with performance profiling tools and GPU-CPU interplay.
- Contributions to PyTorch or related open-source projects are highly valued.
Responsibilities
- Design and implement new PyTorch tensor operators in C++/ATen.
- Develop and validate Python bindings ensuring correct gradient propagation and test coverage.
- Create gold standard reference implementations in eager mode for correctness assessment.
- Collaborate asynchronously with CUDA engineers for kernel optimization integration.
- Profile, benchmark, and report performance at operator and computational graph levels.
- Document APIs, assumptions, and performance features for reproducibility.
Other
- Training support will be provided
- Hourly Contract
- Remote
- flexible and asynchronous
- Excellent written communication and ability to deliver well-documented, modular code.