Stealth-mode startup building next-generation infrastructure for the AI industry to make advanced language models portable, efficient, and customizable for real-world deployments.
Requirements
- Strong programming skills in Python.
- Hands-on experience with PyTorch and the Hugging Face ecosystem (Transformers, Datasets, PEFT).
- Familiarity with LoRA/QLoRA or parameter-efficient fine-tuning methods.
- Understanding of mixed precision training (FP16/BF16) and memory optimization techniques.
- Experience building training scripts that are production-ready (reproducibility, logging, error handling).
- Comfortable working in Linux GPU environments (CUDA, ROCm).
- Ability to collaborate with backend/frontend engineers who are not ML specialists.
Responsibilities
- Implement and maintain LoRA/QLoRA fine-tuning pipelines using PyTorch + Hugging Face Transformers + PEFT.
- Develop logic for incremental training and adapter stacking, producing clean, versioned “delta packs.”
- Automate data preprocessing (tokenization, formatting, filtering) for user-supplied datasets.
- Build training scripts/workflows that integrate with orchestration backends (Node.js, REST/gRPC, or job queues).
- Implement monitoring hooks (loss curves, checkpoints, eval metrics) to feed into dashboards.
- Collaborate with DevOps to ensure reproducible, portable training environments.
- Write tests to guarantee reproducibility and correctness of adapter outputs.
Other
- This position is open only to U.S. citizens or green card holders based in Austin, Texas.
- Willingness to occasionally be present in the office for discussions and team collaboration.
- Competitive compensation, equity potential, and flexible remote work.