Sciforium is looking to solve the problem of efficiently serving next-generation multimodal AI models and real-time applications by developing a proprietary, high-efficiency serving platform. The role aims to architect and lead the development of this platform, which will bring a multimodal, highly efficient foundation model to market.
Requirements
- 5+ years of experience designing and building scalable, reliable backend systems or distributed infrastructure.
- Strong understanding of LLM inference mechanics (prefill vs decode, batching, KV cache)
- Experience with Kubernetes/Ray, Containerization
- Strong proficiency in C++, Python.
- Strong debugging, profiling, and performance optimization skills at the system level.
- Ability to collaborate closely with ML researchers and translate model or runtime requirements into production-grade systems.
- Proficiency in CUDA or ROCm and experience with GPU profiling tools
Responsibilities
- Lead the technical direction of the model serving platform, owning architecture decisions and guiding engineering execution.
- Build core serving components including execution runtimes, batching, scheduling, and distributed inference systems.
- Develop high-performance C++ and CUDA/HIP modules, including custom GPU kernels and memory-optimized runtimes.
- Collaborate with ML researchers to productionize new multimodal models and ensure low-latency, scalable inference.
- Build Python APIs and services that expose model capabilities to downstream applications.
- Mentor and support other engineers through code reviews, design discussions, and hands-on technical guidance.
- Drive performance profiling, benchmarking, and observability across the inference stack.
Other
- Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience
- Effective communication skills and the ability to lead technical discussions, mentor engineers, and drive engineering quality.
- Comfortable working from the office and contributing to a fast-moving, high-ownership team culture.
- Experience at an AI/ML startup, research lab, or Big Tech infrastructure/ML team.
- Competitive salary and equity