Herdora is looking to build the future of inference, GPU optimization and AI infrastructure. The job is to build the core systems that power their GPU optimization platform.
Requirements
- Deep understanding of GPU architectures, CUDA programming, and parallel computing patterns.
- Proficiency in PyTorch, TensorFlow, or JAX, particularly for GPU-accelerated workloads.
- Strong grounding in large language models (training, fine-tuning, prompting, evaluation).
- Proficiency in C++, Python, and possibly Rust/Go for building tooling around CUDA.
- Publications or open-source contributions in inference GPU computing or ML/AI for code are a plus.
- Hands-on experience with large-scale experiments, benchmarking, and performance tuning.
Responsibilities
- Build scalable infrastructure for AI model training and inference
- Lead technical decisions and architecture choices
Other